What is that "parallel execution mode" you are referring to?
As far as I know - Tableau doesn't offer anything to run single backgrouder job on multiple backgrounders.
1 of 1 people found this helpful
I have Tableau Server 10.1.1 so far I see these 2 options, I believe I know what he is referring to:
Serial: Limit this schedule to one background process
Parallel: Use all available background processes for this schedule
We use Serial and Parallel for different extract refreshes and for different reasons.
We actually have a 4 hour extract refresh from a local Oracle server and we run it as serial.
We were told by Tableau to run it as Serial almost 8 months ago. This was to ensure that it completes and does not fail.
I don't remember the whole reasoning behind it. But it would not hurt to switch it to Serial and see if it completes faster.
You said you have tableau server (16 core 64 GB Windows Server 2012)- so it sounds like a Single Node install where the server is the Primary and the sole server correct?
Our current setup as a bit less RAM and Cores as you do but something we did do was upgrade the hard-drives to SSD. So users say they don't notice a difference, all I know is when I do a upgrade on Prod vs Dev at times it can be twice as fast.
When it comes to speed of an extract refresh I am not sure using Parallel: Use all available background processes for this schedule will always make it faster. Or that you can make it consume all 4 backgrounder processes for one db pull?
The other thing to consider is since it sounds like a Single Server if Tableau took all 4 processes for that one refresh it would prevent any AD Hoc refreshing during that time for users. Or possibility even a small refresh that would only take a few minutes to run, and it would be queued in line.
We here have also started to pull another version of that large 4hr extract we call Lite A. And we use that to get a more targeted set of data in about half the time.
Also for this large extract Tableau had us experiment with these settings too:
tabadmin set backgrounder.restart_interval_in_minutes 1440
tabadmin set backgrounder.querylimit 2200
tabadmin set extract.unknown.holdoff 22300
Run this process nightly to clear out temp files
tabadmin cleanup --restart
tabadmin temp.file_expiry_s: 43200
But your settings might vary according to your environment I am just using some examples here.
1 of 1 people found this helpful
To clarify, "Parallel Execution" does not mean that a single refresh task will use multiple backgrounders — it simply means that the tasks on that schedule may be assigned to any of your backgrounders. Each task, however, will use one and only one backgrounder once it starts running.
"Serial Execution" means that all of the tasks on a given schedule will use the same backgrounder — which implies that they will run one after another "in serial".
Which execution method to configure for a schedule depends on a few factors. Here are a couple key things to keep in mind:
- "Parallel Execution" generally gives the scheduler more flexibility in assigning tasks and helps level-load the backgrounders. With "Serial Execution", you may find that one backgrounder has a backlog of tasks while other backgrounders are sitting idle. (I'm looking at that situation on our server as I write this.)
- "Serial Execution" is useful to keep certain tasks from interfering with the timely completion of other tasks. If all schedules are on "Parallel Execution", there is a risk of starvation — a given task waiting an inordinately long time to gain access to a backgrounder. You can silo less critical tasks (especially those that take a long time to run) in a "Serial Execution" schedule to keep them on a single backgrounder, and run more critical tasks in a "Parallel Execution" schedule to ensure they have access to the other backgrounders. (NOTE: Setting appropriate task priorities can also substantially mitigate the likelihood that a critical task will be subject to starvation, and offers even more fine-tune control than the silo method of "Serial Execution".)
Whether you use serial/parallel execution or task priorities to govern your schedules, best results will be achieved if you take a holistic look at what tasks run, how long they take to run, and when each one needs to complete.
Nevertheless, once a task starts to run, it will only use one backgrounder to execute. Tableau does not support parallelism in the execution of a single task.
Having seen how long general-purpose "give me everything" extracts take to run (we have one such extract that takes at least 5 hours to execute each day), I am a big proponent of reserving those extracts for low-frequency reporting needs — think "end of month" type stuff. For daily reporting, I almost never use the "give me everything" extracts, and instead build extracts for very focused queries that are more specifically aligned to the needs of the specific report. Usually that means tens of millions of rows get distilled down to several thousand rows. The query takes under 5 minutes, the extract is created in seconds, and dashboards are that much faster because they're sifting through a much smaller data set every time a user interaction changes something.
I hope these thoughts help.
According to Optimize for Extracts, running an extract refresh from a schedule with parallel execution mode will parallelize work across multiple backgrounders if they are available. The article goes so far as to recommend using serial execution mode if a schedule runs a very large job to avoid allowing one job from using up all backgrounder processes.
Configure the execution mode for extract refreshes
When you create extract refresh schedules, ensure that they run in parallel execution mode. When you run a schedule in parallel, it runs on all available backgrounder processes, even if the schedule contains only one refresh task. When you run a schedule serially, it only runs on one backgrounder process. By default, the execution mode is set to parallel so that refresh tasks finish as quickly as possible.
However, in some circumstances, it can make sense to set the execution mode to serial. For example, you might set the execution mode to serial if a very large job is preventing other schedules from running because it uses all available backgrounder processes.