Hey Carl---sorry to hear about this! When this occurs, did you notice which process was taking up the most CPU on your cluster? We have seen tdeserver64.exe spike CPU on a few occasions running 9.0.1.
no answers for you, just thoughts of what could be going on with paging, parallelization, calcs. Let us know the outcome.
Same as Matt, resource spikes are very frustrating. My gut feeling, also same as Matt, is you are probably experiencing the Data Engine (tdeserver64.exe) spike and hold. The tell is the main Data Engine processes on the nodes where Data Engine is explicitly configured in TabConfig can get stuck at 10-30% of CPU and if other processes on that host get CPU hungry, resource spikes can happen. Matt and I ran into this after two weeks of running 9.0.1 in production without restarting.
Could you open a Support case with your logs at your soonest convenience? We should be able to confirm whether this is the known issue for which we have adjustments coming soon.
Please include a few screen shots and description of what Task Manager shows for the processes on the spiking hosts. If you have host metrics like PerfMon that captures this info, that would be great.
If this happens to be something different we'd definitely like to start investigating.
Geoff/Matt, Thanks for the feedback. We have a a ticket open with Tableau. We have also sent in the logs. We are currently trying to recreate situations that are generating this. One scenario was a refresh of a locally saved TDE file onto the Data Server and running an a visualization on that. About 2mio records. It seemed to always generate a spike and hold.
"Nice" as I'm hoping this is the known issues so you can get relief sooner. We're running pre-release code now and have not hit the issue we saw organically nor been able to force a repro on our staging cluster.
As it's looking to be this, mitigation ideas for production if you have to restart again:
- If running multiple VizQL and primary Data Engines on the same host(s) reduce to VizQL and DE by 1
- Example: we were considering reducing from 3 > 2 VizQL, and if necessary, 2 > 1 Data Engine on the effected hosts
- We didn't hit a point where we tried this in production though
I don't know and cannot say with any confidence, without having support look at your issue in more detail, if what you are seeing is similar to what Matt is seeing on his deployment. Please log a support case so we can take a closer look at your logs and data.
In comparison to 8.x, 9.0 has more server processes running now such as cache server, cluster controller, file store etc. Depending on your workloads, you may simply need more RAM to handle the load. You should consider 8GB RAM per core, at a minimum, for real production usage. Your machine seems low in RAM. With low RAM, you maybe swapping things to disk and other contentions would follow to CPU. For testing, server can run on much less for trial and proof of concepts and small teams.
If memory is one of the constraints your deployment is experiencing, it makes sense that 8.x was running better on your 8 core 32GB RAM.
Lastly, as part of your support ticket, check for error dump files in your log directory. If you are seeing a lot of them, that may indicate an infrastructure issue as well.
Not a straight forward answer, but there is very little to go on form your message. Please work with support.
Neelesh, thanks for the feedback. We have a ticket logged. No solution yet. We are hoping that we aren't the only ones that Tableau engineering is looking into it. We plan to upgrade the hardware in 1wk+ to 64gig of RAM on a dedicated server.
An odd coincidence is that we noticed was a spike in events on the server around and after the moment of installation. We see a spike in access events since the upgrade. These seemed to be connected to specific data connections on the data server and it looked like some sort of feedback loop was generating. This could all be coincidental. Our IT group is working with Tableau.
We reverted back to 8. The system was unstable and we could not solve it in time. Thanks for the feedback. We will continue to research the issue and test a DEV server.
We have been experiencing this same problem. It seems to happen when we execute certain queries against a Tableau Server data source. VizQL processes spike and Tableau Server becomes unresponsive. The only way to recover is to tabadmin restart.
For a short term fix, we've created local extracts of the data source and this seems to fix the problem.
Our production machine is an 8 core box with 32gb of ram running version 9.0.0 (9000.15.0318.1720) 64-bit
Corey, thanks. That is one of our theories. We haven't been able to cleanly isolate it. Was there anything unique to your data sources? We noticed that something seemed to be connected to running a query against a data source on the data server. We experimented with a number of conditions.
Our setup includes the following sources: Excel, Oracle, MS Access and SQL Server. We also ran extracts against generated TDE files and also had live connections against TDE files saved on a network drive.
We did restarts, but it was very problematic to maintain and ensure reliablity. v8 was very stable for us.
There isn't anything unique with the specific data source. The data originates from MSSQL and is only about 400,000 rows. I have a specific workbook that I can reproduce the issue every time, but it only is an issue on Tableau Server. If I open this workbook in Desktop everything works fine even on a live data source connection.
I am working with support on this, but haven't found a resolution yet.
Corey, this is exactly what is happening to us. Do you have anything unique inside? Exotic calculations? One book that is specifically problematic is 6mio records. The minute someone opened it on server, CPU usage when from 30% to 100% and flat lined. It eventually rendered after 10-20min. No other books worked until that one got finished. We have some long calculation names in it and table calcs and logical functions (ZN).
The only thing I can think of that would be unique is the following:
fairly long case statements
A lot of conditional calculations
Parameter controlling what field to have in the view
The workbook complexity is medium I'd say. We have a lot more complex workbooks that aren't experiencing this issue, but is not a bare bones workbook.