thanks for the detailed explanation. what version of 9 are you using? there were posts a few weeks ago about 9.0.1 cpu spiking on vizql processes, not sure if it applies in this case. I think they fixed it with 9.0.2
while it's running look on the box to see if one 1 out of 4 of the cores is staying at 100%
I am running 9.0.0. None of the cores is staying above 50% CPU. It is definitely not CPU or memory.
can you try 9.0.2?
Are you sure your Vizql isn't simply being killed at ~100 seconds? Note how your RAM utilization is quite high for the 1st 100 seconds, then falls. That looks like a process cycling to me.
Hi Russell, I am 100% sure that this is not a process cycling. The visqlserver.exe in my scenario is unresponsive, but running. In fact, if I manually kill visqlserver.exe in the Task Manager, a new one will be spanned ( as expected) and requests will start passing. So this very simple tabjolt test makes my visqlserver unresponsive and I would like to find out what is the limiting factor here. Any idea?
Whoops! The default on this report for RAM-related counters is Committed Bytes, so I assumed that was what you were showing. If the measure were Committed Bytes, we'd be looking at a process getting killed and RAM being freed. You're actually showing Available Bytes.
So there's clearly no process crash here - I took the reverse hockey stick to be RAM being freed. Instead, it is RAM being used and then veeerrry slowly released. But which processes are using it?
It's pretty much impossible to tell you what's going on here based on looking at a single viz out of the "suite" of info represented in the larger workbook you have. We need to see which processes are grabbing all the RAM on your box in < 100 seconds, and all sorts of other nuggets to guestimate what is going on here. an you extract the first two data sources in the workbook and then package everything up as a twbx and post?
Here you go. As I said - it is not CPU or Memory bound. Something else is limiting Tableau performance in this scenario, bit I need help figuring out what is it exactly.
PerformanceViz65.twbx 415.8 KB
Do you have any insight into the workbook I have posted?
I took a quick peak at the workbook and looked at the vizql process. After about ~100 sec, the vizql server CPU utilization goes flat to "0" with an occasional 2% spike. This means the vizql server process itself is not doing much work, it seems to be retrying something and something in the system or outside the system is not allowing it to proceed. You can see this in the "VizQL" tab.
>>I used jstack to create a callstack of vzqlserver when it gets in this state and out of 237 threads - 188 are in the
>>WAITING (parking) state. I don't know enough java to tell what is it waiting for.
Seems to jive with your observation of things waiting because work is not being done and in that state it's normal for the CPU to not show usage.
Few things to consider:
1. The server minimum hardware recommendation (for just the server only) is 4 cores 8GB.
2. If you are running TabJolt on the same machine as server, you are imposing a significant resource contention between the load tool and the system under test. TabJolt will want to use the CPU and other resources to drive the load. This is an anti-pattern for load testing. You should move TabJolt off the server, if you have it on the same box - that is not clear to me.
3. When systems get severely resource constrained, you maybe simulating stress conditions more than load conditions - you should consider a more appropriate hardware for your scenario
4. You are running on VM - if that is shared, there could be many reasons why your VM is behaving the way it is behaving including the virtualization platform configuration and other applications running on that virtualized host.
5. Under stress conditions, garbage collection from JVMs, which typically tend to halt (as in stop all application code running in it) by design -- this could cause unpredictable behavior.
Lastly, there is of course always the chance that you have discovered some bug and we may or may not have fixed it in the latest maintenance release. Please consider picking up the latest maintenance release.
Also, while your environment is OK for prototyping/testing, it's not good for production. For load testing, assuming you want to test for production workloads, you should at least be on the recommended production hardware with all the above caveats (like don't run tabjolt on same server hardware etc.)
Hope some of this helps. If you continue to run into the same problem after addressing some of the above, please file a support case with logs etc. We would want to help track down any bugs if you are running into them.
Thank you (to you and everyone) for trying TabJolt and supporting the community!
Just two suggestions:
1. on your test VM machine run process monitor (Process Monitor) and try to catch the faulty process if any
2. upgrade to the latest 9.x.x version or try 9.1 series.
Upgrading to 9.0.4 did not make a difference and tabjot is running on another machine.
I have a question about your point 3 " When systems get severely resource constrained, you maybe simulating stress conditions more than load conditions - you should consider a more appropriate hardware for your scenario"
When I run tabjolt with one concurrent user for a few minutes, it should be neither stress nor load, it is just one user using the system, right? Maybe I misunderstood the intent of the tabjolt parameters. By increasing concurrent user count I am increasing the load. Increasing the duration should not matter as long as I stay with, let's say, one user, which I assume 4 cores should support indefinitely.
My workbook is very simple workbook with one chart with no external data source. I don't see CPU or Memory constrained, so which resources you are thinking might be constrained in my case?
I think we've solved it: The workbook in question was a nice, simple test intended just to exercise Tabjolt. The datasource was an Excel worksheet. The datasource was not extracted. Once we extracted and re-published, the Tabjolt tests ran flawlessly. My theory here is that the operating system was having a difficult time opening multiple connections to the Excel datasource. Once we changed the datasource to something designed to handle multiple simultaneous client requests (a TDE in this case), our problems went away.
Curt, nice find! I'll keep this tidbit in my head for future reference.