4 Replies Latest reply on Apr 5, 2018 10:46 AM by Jack Freeman

    Tableau server only one backgrounder is busy in parallel execution mode

    Jayant Sahewal

      Hello,

       

      I have a tableau server (16 core 64 GB Windows Server 2012) with 4 active backgrounder processes. I scheduled an extract in parallel execution mode. However, it seems like the extract is utilizing only one of the backgrounder processes. The server is utilizing only 7% CPU and around 20% Memory.

       

      If I run multiple extracts at the same time, I can see more backgrounder processes become busy. However, I would like a single extract to use multiple/all the available backgrounder processes.

       

      My extract takes 2 hours in getting completed. It extracts data (around ~35M rows) from a Redshift cluster. I can see from the Redshift cluster that the query gets completed it 12 minutes. My goal is to get the extract completed in less than half an hour or whatever the minimum I can achieve.

       

      Can someone suggest something or let me know if I am supposed to do something differently?

       

      I am new to Tableau server and the community. So, please let me know if I need to provide more information.

       

      Best,

      Jayant

        • 1. Re: Tableau server only one backgrounder is busy in parallel execution mode
          Dmitry Chirkov

          What is that "parallel execution mode" you are referring to?

          As far as I know - Tableau doesn't offer anything to run single backgrouder job on multiple backgrounders.

          • 2. Re: Tableau server only one backgrounder is busy in parallel execution mode
            Jeffrey Lutomski

            I have Tableau Server 10.1.1 so far I see these 2 options, I believe I know what he is referring to:

             

            Serial: Limit this schedule to one background process

            Parallel: Use all available background processes for this schedule

             

            We use Serial and Parallel for different extract refreshes and for different reasons.

             

            We actually have a 4 hour extract refresh from a local Oracle server and we run it as serial.

            We were told by Tableau to run it as Serial almost 8 months ago. This was to ensure that it completes and does not fail.

            I don't remember the whole reasoning behind it. But it would not hurt to switch it to Serial and see if it completes faster.

             

            You said you have tableau server (16 core 64 GB Windows Server 2012)- so it sounds like a Single Node install where the server is the Primary and the sole server correct?

             

            Our current setup as a bit less RAM and Cores as you do but something we did do was upgrade the hard-drives to SSD. So users say they don't notice a difference, all I know is when I do a upgrade on Prod vs Dev at times it can be twice as fast.

             

            When it comes to speed of an extract refresh I am not sure using Parallel: Use all available background processes for this schedule will always make it faster. Or that you can make it consume all 4 backgrounder processes for one db pull?

            The other thing to consider is since it sounds like a Single Server if Tableau took all 4 processes for that one refresh it would prevent any AD Hoc refreshing during that time for users. Or possibility even a small refresh that would only take a few minutes to run, and it would be queued in line.

             

            We here have also started to pull another version of that large 4hr extract we call Lite A. And we use that to get a more targeted set of data in about half the time.

            Also for this large extract Tableau had us experiment with these settings too:

             

            tabadmin set backgrounder.restart_interval_in_minutes 1440

            tabadmin set backgrounder.querylimit   2200

            tabadmin set extract.unknown.holdoff 22300

             

             

            Run this process nightly to clear out temp files

            tabadmin cleanup --restart

             

            Or

             


            tabadmin temp.file_expiry_s: 43200

             


             

             

            But your settings might vary according to your environment I am just using some examples here.

             

            Jeff

             

            1 of 1 people found this helpful
            • 3. Re: Tableau server only one backgrounder is busy in parallel execution mode
              Jamieson Christian

              Jayant,

               

              To clarify, "Parallel Execution" does not mean that a single refresh task will use multiple backgrounders — it simply means that the tasks on that schedule may be assigned to any of your backgrounders. Each task, however, will use one and only one backgrounder once it starts running.

               

              "Serial Execution" means that all of the tasks on a given schedule will use the same backgrounder — which implies that they will run one after another "in serial".

               

              Which execution method to configure for a schedule depends on a few factors. Here are a couple key things to keep in mind:

               

              • "Parallel Execution" generally gives the scheduler more flexibility in assigning tasks and helps level-load the backgrounders. With "Serial Execution", you may find that one backgrounder has a backlog of tasks while other backgrounders are sitting idle. (I'm looking at that situation on our server as I write this.)
              • "Serial Execution" is useful to keep certain tasks from interfering with the timely completion of other tasks. If all schedules are on "Parallel Execution", there is a risk of starvation — a given task waiting an inordinately long time to gain access to a backgrounder. You can silo less critical tasks (especially those that take a long time to run) in a "Serial Execution" schedule to keep them on a single backgrounder, and run more critical tasks in a "Parallel Execution" schedule to ensure they have access to the other backgrounders. (NOTE: Setting appropriate task priorities can also substantially mitigate the likelihood that a critical task will be subject to starvation, and offers even more fine-tune control than the silo method of "Serial Execution".)

               

              Whether you use serial/parallel execution or task priorities to govern your schedules, best results will be achieved if you take a holistic look at what tasks run, how long they take to run, and when each one needs to complete.

               

              Nevertheless, once a task starts to run, it will only use one backgrounder to execute. Tableau does not support parallelism in the execution of a single task.

               

              Having seen how long general-purpose "give me everything" extracts take to run (we have one such extract that takes at least 5 hours to execute each day), I am a big proponent of reserving those extracts for low-frequency reporting needs — think "end of month" type stuff. For daily reporting, I almost never use the "give me everything" extracts, and instead build extracts for very focused queries that are more specifically aligned to the needs of the specific report. Usually that means tens of millions of rows get distilled down to several thousand rows. The query takes under 5 minutes, the extract is created in seconds, and dashboards are that much faster because they're sifting through a much smaller data set every time a user interaction changes something.

               

              I hope these thoughts help.

              1 of 1 people found this helpful
              • 4. Re: Tableau server only one backgrounder is busy in parallel execution mode
                Jack Freeman

                According to Optimize for Extracts, running an extract refresh from a schedule with parallel execution mode will parallelize work across multiple backgrounders if they are available.  The article goes so far as to recommend using serial execution mode if a schedule runs a very large job to avoid allowing one job from using up all backgrounder processes.

                Configure the execution mode for extract refreshes

                When you create extract refresh schedules, ensure that they run in parallel execution mode. When you run a schedule in parallel, it runs on all available backgrounder processes, even if the schedule contains only one refresh task. When you run a schedule serially, it only runs on one backgrounder process. By default, the execution mode is set to parallel so that refresh tasks finish as quickly as possible.

                However, in some circumstances, it can make sense to set the execution mode to serial. For example, you might set the execution mode to serial if a very large job is preventing other schedules from running because it uses all available backgrounder processes.