We are looking for some advice on scaling an already large Tableau Server implementation that is connected to multiple (20+) RedShift clusters. Our 7 node Tableau Server is currently hosting multiple sites that each connect to an average of 2-3 different RedShift clusters. The problem we are facing is that poorly designed queries or RedShift clusters are frequently causing large extract refresh delays on Tableau Server. In other words, one single bad actor (slow RedShift cluster) can clog the entire Tableau extract schedule queue.
We are looking for ways to isolate problematic RedShift clusters (or any other db’s) so that they do not hoard background processes and delay extracts/subscriptions that would otherwise be running normally. Serial schedules could serve as a band aid but are not a dynamic, long term solution. Ideally we would have a WLM (Workload Management) type of solution where we are able to limit the number of background processes that a specific Tableau site or schedule can consume at a time.
Has anyone faced such an issue and if so how have you isolated or minimized the impact that one RedShift cluster has on schedule delays outside of running serial schedules?