So, I'm finally ready to share a dashboard we've been using for the last eight months or so to monitor our own Tableau Server deployment for performance. It has resulted in us being able to proactively address lots of issues with Tableau Server--hung processes, SAN problems, and a worker dropping from the domain--to cite a few examples. It also has allowed us to validate that we did or did not see performance changes from version to version, after server restarts, or after changing configurations in various ways.
The basic question this viz tries to answer is: "Is there a performance problem with Tableau Server right now?" The follow up question it tries to answer is: "If there is, then what should I, the administrator, do about it?"
Here's a presentation I gave at the Server Admin User Group last week that explains the problems with other approaches, and why I think this dashboards solves them: Server Admin Virtual Tableau User Group May - YouTube
In a nutshell, this dashboard measures performance of Tableau Server overall by comparing each viz's load times to a rolling baseline generated from the prior week's load times of that same viz. It then aggregates those per-viz performance numbers across the entire server at the hourly level in two ways:
1. Overall performance of all vizzes, with each viz weighted equally
When this is orange, a large proportion of Tableau Server vizzes are slower than normal. This is generally a good indicator that your server is performing poorly. (top pane of Server Performance sheet)
2. Total % of requests that are slower than normal
When the orange bars are higher than normal, it means that a high percentage of the total number of viz loads are slower than normal. This can mean that a very popular viz is slow, rather than lots of vizzes being slow. The gray bars are overall traffic volume. (bottom pane of Server Performance sheet, repeated with various breakdowns over the rest of the bottom of the dashboard)
Instructions for installing are on the Start sheet of the dashboard. We increment the extract hourly and have an alert set on the Server Alert pane that informs us when performance over the previous hour was worse than the threshold we've established.
- It is very useful if you have CPU / Mem data from each worker machine that you can add to this viz. We've been able to easily pinpoint process issues that way. It would need to be collected hourly at the slowest. Tabmon or an IT-based solution that allows data access could be used for this.
- Warning: As valuable as this is, I do not intend to maintain this dashboard over time, as I'm already stretched quite thin. The data source this dashboard relies on, TS Web Requests, I do plan to maintain. If you already use that as published data source, I recommend you point this dashboard to it instead of using the local connection.
- I'm not a data scientist and cannot claim that my methodology is statistically sound (or that there aren't bugs in my calcs!). I can only claim that it's been useful to me, and that suggests to me that it would be useful for you, too.
- More caveats exist. Read about them on the Start page.
I hope this is useful to you! If you find it is, or isn't, or you want to show off improvements you've made, please let us all know in the comments.
PS: This dashboard is just one example of what can be done with the set of Tableau Server Data Sources that I've also shared on the community. Check 'em out!
Edit 2019-05-23: Attaching slide deck from presentation