13 Replies Latest reply on Sep 4, 2015 9:13 AM by Jeff Strauss

    Interpreting Tabjolt results

    Tatiana Titova

      Hello,

       

      I have been running tabjolt on my test Tableau environment where I am the only one authorized user.  The machine is 4 core VM with 24 GB memory with one of each type of Tableau processes running.  All the configuration settings are out of the box, no customizations.

       

      I have built a very simple workbook to start with. It has just a single 4 bar chart in one sheet, which is used in one dashboard.  The data source for the workbook is the Superstore sample data shipped with the Tableau desktop.  I run the following command  go --t=testplans\InteractVizLoadTest.jmx --d=1200 --c=1, which means one concurrent user for 20 minutes. Again, I am the only one with access to this environment, so there is no load other than tabjolt.  

       

      I am getting pretty consistent results as follows.

      After about 100 seconds,  all bootstrap requests start to fail with 100%.  The error is "Timer did not end properly. SubResult EndTime is 0.".  

      Tabjolt.png

       

      Memory and CPU are not the bottleneck

      CPU_Memory.png

       

       

      If I browse to the server in my browser on my laptop ( different from tabjolt running machine), I am not able to open this report, it spins a wait cursor for a long time and then shows an error. Other reports open ok.   When the system goes in this state I will no longer be able to run this report unless I restart Tableau services ( or at least kill vzqlserver.exe from the Task Manager ).

       

      Even if I leave it for an hour and all the sessions should presumably expire when I am back - this report can no longer be opened.

      I used jstack to create a callstack of vzqlserver  when it gets in this state and out of 237 threads - 188 are in the

      WAITING (parking)  state. I don't know enough java to tell what is it waiting for.

       

       

       

      So the question is, what is the limiting factor here? How can I tell? 100 seconds by one concurrent user does not seem like a huge load to me.  I am ok on CPU and memory, yet, this report can no longer be viewed from any client unless I bounce vzqlserver.exe process.

       

      Thank you,

      Tatiana

        • 1. Re: Interpreting Tabjolt results
          Jeff Strauss

          thanks for the detailed explanation.  what version of 9 are you using?  there were posts a few weeks ago about 9.0.1 cpu spiking on vizql processes, not sure if it applies in this case.  I think they fixed it with 9.0.2

           

          while it's running look on the box to see if one 1 out of 4 of the cores is staying at 100%

          • 2. Re: Interpreting Tabjolt results
            Tatiana Titova

            I am running 9.0.0.  None of the cores is staying above 50% CPU.   It is definitely not CPU or memory.

            • 3. Re: Interpreting Tabjolt results
              Jeff Strauss

              can you try 9.0.2?

              • 4. Re: Interpreting Tabjolt results
                Russell Christopher

                Are you sure your Vizql isn't simply being killed at ~100 seconds? Note how your RAM utilization is quite high for the 1st 100 seconds, then falls. That looks like a process cycling to me.

                • 5. Re: Interpreting Tabjolt results
                  Tatiana Titova

                  Hi Russell,  I am 100% sure that this is not a process cycling.  The visqlserver.exe in my scenario is unresponsive, but running.  In fact, if I manually kill visqlserver.exe in the Task Manager, a new one will be spanned ( as expected) and requests will start passing.  So this very simple tabjolt test makes my visqlserver unresponsive and I would like to find out what is the limiting factor here.  Any idea?

                  • 6. Re: Interpreting Tabjolt results
                    Russell Christopher

                    Whoops! The default on this report for RAM-related counters is Committed Bytes, so I assumed that was what you were showing. If the measure were Committed Bytes, we'd be looking at a process getting killed and RAM being freed.  You're actually showing Available Bytes.

                     

                    So there's clearly no process crash here - I took the reverse hockey stick to be RAM being freed. Instead, it is RAM being used and then veeerrry slowly released. But which processes are using it?

                     

                    It's pretty much impossible to tell you what's going on here based on looking at a single viz out of the "suite" of info represented in the larger workbook you have. We need to see which processes are grabbing all the RAM on your box in < 100 seconds, and all sorts of other nuggets to guestimate what is going on here. an you extract the first two data sources in the workbook and then package everything up as a twbx and post?

                    • 7. Re: Interpreting Tabjolt results
                      Tatiana Titova

                      Here you go. As I said - it is not CPU or Memory bound.  Something else is limiting Tableau performance in this scenario, bit I need help figuring out what is it exactly.

                      • 8. Re: Interpreting Tabjolt results
                        Tatiana Titova

                        Hi Russell,

                        Do you have any insight into the workbook I have posted?  

                        • 9. Re: Interpreting Tabjolt results

                          Tatiana Titova

                           

                          Tatiana,

                          I took a quick peak at the workbook and looked at the vizql process. After about ~100 sec, the vizql server CPU utilization goes flat to "0" with an occasional 2% spike. This means the vizql server process itself is not doing much work, it seems to be retrying something and something in the system or outside the system is not allowing it to proceed. You can see this in the "VizQL" tab.

                           

                          >>I used jstack to create a callstack of vzqlserver  when it gets in this state and out of 237 threads - 188 are in the

                          >>WAITING (parking)  state. I don't know enough java to tell what is it waiting for.

                           

                          Seems to jive with your observation of things waiting because work is not being done and in that state it's normal for the CPU to not show usage.

                           

                          Few things to consider:

                          1. The server minimum hardware recommendation (for just the server only) is 4 cores 8GB.

                          2. If you are running TabJolt on the same machine as server, you are imposing a significant resource contention between the load tool and the system under test. TabJolt will want to use the CPU and other resources to drive the load. This is an anti-pattern for load testing. You should move TabJolt off the server, if you have it on the same box - that is not clear to me.

                          3. When systems get severely resource constrained, you maybe simulating stress conditions more than load conditions - you should consider a more appropriate hardware for your scenario

                          4. You are running on VM - if that is shared, there could be many reasons why your VM is behaving the way it is behaving including the virtualization platform configuration and other applications running on that virtualized host.

                          5. Under stress conditions, garbage collection from JVMs, which typically tend to halt (as in stop all application code running in it) by design -- this could cause unpredictable behavior.

                           

                          Lastly, there is of course always the chance that you have discovered some bug and we may or may not have fixed it in the latest maintenance release. Please consider picking up the latest maintenance release.

                           

                          Also, while your environment is OK for prototyping/testing, it's not good for production. For load testing, assuming you want to test for production workloads, you should at least be on the recommended production hardware with all the above caveats (like don't run tabjolt on same server hardware etc.)

                           

                          Hope some of this helps. If you continue to run into the same problem after addressing some of the above, please file a support case with logs etc. We would want to help track down any bugs if you are running into them.

                           

                          Thank you (to you and everyone) for trying TabJolt and supporting the community!

                          Cheers,
                          Neelesh

                          • 10. Re: Interpreting Tabjolt results
                            Cristian Vasile

                            Just two suggestions:

                            1. on your test VM machine run process monitor (Process Monitor) and try to catch the faulty process if any

                            2. upgrade to the latest 9.x.x version or try 9.1 series.

                             

                            Regards,

                            Cristian.

                            • 11. Re: Interpreting Tabjolt results
                              Tatiana Titova

                              Hi Neelesh

                               

                              Upgrading to 9.0.4 did not make a difference and tabjot is running on another machine.

                               

                              I have a question about your point 3 " When systems get severely resource constrained, you maybe simulating stress conditions more than load conditions - you should consider a more appropriate hardware for your scenario"

                               

                              When I run tabjolt with one concurrent user for a few minutes, it should be neither stress nor load, it is just one user using the system, right?   Maybe I misunderstood the intent of the tabjolt parameters. By increasing concurrent user count I am increasing the load.  Increasing the duration should not matter as long as I stay with, let's say, one user, which I assume 4 cores should support indefinitely.

                               

                              My workbook is very simple workbook with one chart with no external data source.  I don't see CPU or Memory constrained, so which resources you are thinking might be constrained in my case?

                              • 12. Re: Interpreting Tabjolt results
                                Curt Budd

                                I think we've solved it:  The workbook in question was a nice, simple test intended just to exercise Tabjolt.  The datasource was an Excel worksheet.  The datasource was not extracted.  Once we extracted and re-published, the Tabjolt tests ran flawlessly.  My theory here is that the operating system was having a difficult time opening multiple connections to the Excel datasource.  Once we changed the datasource to something designed to handle multiple simultaneous client requests (a TDE in this case), our problems went away.

                                • 13. Re: Interpreting Tabjolt results
                                  Jeff Strauss

                                  Curt, nice find!  I'll keep this tidbit in my head for future reference.

                                   

                                   

                                  Cheers, Jeff