Adding CPU's Vs Adding additional worker for Backgrounder services ?

Version 2

    Team ,



    We have 5 machine architecture - 1 primary machine and 4 workers . One worker dedicated for Backgrounder services ( No other tableau service running on this machine  )

    We are observing CPU utilization hikes on worker 4( where backgrounder services are running ,its the time when our extracts runs daily ) . What would be ideal in that situation - adding additional worker and allocate backgrounder services on it ? or Add CPU's ( hardware ) ?


    My assumption( Please correct me if I am wrong )  is Backgrounder services are single threaded process ,which means one Backgrounder service can process 1 extract refresh /subscription request at a time  . So it would be good to add another worker and assign Backgrounder services so that more extract refresh can run simultaneously and reduce the stress of worker single backgrounder machine  or  adding CPU's  to same backgrounder worker will fasten the speed of extract refresh ?


    Below is the screenshot of current services allocation



    Thanks for any help .



    Hey Sunil,


    How many cores are on the backgrounder machine? Your backgrounder processes should not exceed that. (Think 1 backgrounder task per core). Tableau Server will do 1 extract refresh per backgrounder process at any given time, the rest will be queued.


    If you want to have more tasks running at 1 time, then you can add processes, but add cores at the same time.


    As far as speeding up the extract refreshes, more cores will process more items from the queue at a time, but most of the speed it dependant on the datasource itself. If the database is slow, the refresh will be slow.


    Hope this helps!

    - Derrick


    HI Derrick ,


    We have 8 cores dedicated for each worker .


    Great! The backgrounder process will definitely eat the core it runs on occasionally, but it won't branch out to other cores, so you are limited to the speed/processing power of your core.


    If you want more background tasks running at once, you could increase the cores and number of backgrounder services, but it won't speed up a single refresh - for that you'd need to dig into the datasource itself. The network connection to the source can also slow things down as well. (IE: if you are pulling millions of records from somewhere outside the LAN or over VPN, it will be slow.)


    Hello Sunil,


    In addition to Derrick's comment, since backgrounder can consume CPU, I/O, or network resources based on the nature of the workload presented to it and tax additional resources, based on few conducted tests i would recommend adding an additional machine in this case to allocate more processes of backgrounder (recommendation: 4 processes of backgrounder on each worker) rather than adding more cores to a single machine.

    Additionally, i was also wondering, you've mentioned that the worker is dedicated to backgrounders only, was the screenshot of our processes taken prior to removing the two cache servers?





    Is there a formula for a limit to cache processes? I am starting to plan out a cluster and our users are flipping out about speed or lack of it? Also doesn't putting many of the backgrounders on one host create a single point of fail-over?


    Sunil of your 4 workers and 1 primary are they Virtual machines? What OS? Are they on different data centers or cloud instances to reduce resource "tie up" issues and create better redundancy? What is your RAM and CPU on each host? How much content and refreshes on average are you doing?


    Jeff -


    You generally want 1 or 2 cache servers per vizql, and you should co-locate the processes on the same machine AS each vizql process.


    Sunil's inclusion of 2 Cahe Server processes on his backgrounder node is consistent with the screenshot on Performance Tuning Examples


    Is this doc. wrong?


    The max of 4 backgrounders for an 8-core node is very conservative ("To calculate the maximum number, divide the computer's total cores by 2") and I like your choice of 6.


    If you use backgrounder.sort_jobs_by_run_time_history_observable_hours, consider that it might cause your CPU to spike more if you have a backlog of extracts to work through (assuming shorter extracts are more CPU bound that long-running ones that might be waiting for some DW to return something for a complex query).


    This document was generated from the following discussion: Adding CPU's Vs Adding additional worker for Backgrounder services ?