This is my first ever blog entry, enjoy and let me know if you find it useful or not!

 

----------------------------------------------------------

 

Automated monitoring is integral to any Enterprise deployment.  This blog entry covers how we have gone about implementing automated monitoring against our Tableau Server deployment.  Feel free to leverage for your own deployment.  Note that this does not have any adverse affect on Tableau Server performance.

 

  • The default out of the box email alerting functionality generally isn't enough for many Enterprises as the email itself will not be integrated into the NOC (network ops center) workflow.  Theoretically the email could be routed to the NOC distribution list, however I'm not sure that this will do anything in pursuit of their workflow.

 

  • First step toward full automation.  Your monitoring app needs to be able to seamlessly check Tableau status without having to be interrupted with a login prompt.  The way you do this is by whitelist trusting your monitoring app within TS tabadmin.  In the TS admin guide it states that either hostnames or IP addresses can be leveraged.  In our testing, hostnames didn't work.  But maybe in newer releases, hostnames will work, we haven't checked lately.

 

  • Second step toward full automation.  Work with your Enterprise monitoring team to develop a script to verify health of Tableau Server.  We do this via a 2-tiered status check via our monitoring app which happens to be Nagios.  I think we look at statuses every 3 minutes.
    • Tier 1: Monitoring app runs a check_http against the Tableau server and expects a return code of 200.  This is the expected response for "normal" operation for any user logging in.  Prior to 9.x, the expected status was 401.

              

    • Tier 2: Monitoring app looks at the status for all individual services.  For normal operations, it expects.  service status="OK", “Active”, “Passive”, “Busy”, “ReadOnly”, “ActiveSyncing”  You can see this same status for your Tableau Server if you go to http://tableauserver/admin/systeminfo.xml

              

    • Summary status: All statuses for our Tableau Server deployment are green.  This is the way it should be!!!!!!  If you see that we have setup duplicate status checks for tier 1 and tier 2, this is intentional.  This is so that both the NOC and on-call support engineers can both be notified.



  • What about when you want your TS deployment to not be monitored?  Backup cleanups, upgrades, windows maintenance, etc.  You have to remember to turn off monitoring in order to not raise a bunch of NOC red flags.  We do this by running a script prior to any maintenance cycle.  And then at the end, we turn monitoring back on.


 

  • ok, so you have an outage, what do you do now? We worked with our NOC team to develop initial triage "Standard Operating Procedures".  In order to prevent any false alarms, we require two consecutive alerts prior to corrective action being taken.