So far we've had no need for extract size limits. Our largest is about 1GB but the rest are smaller in comparison. I don't want to impose limits unless it seriously impacts our Server. Even then the issue is generally fixable by reminding them to not include every column they don't need/use, encourage use of native connections and to shy away from Custom SQL.
Not sure I understand the need for "prioritizing extract refreshes during off-peak hours". Extracts depend on how 'fresh' the data needs to be for a report and the variety of groups. Where I work, Tableau is available corporate-wide -- which means world-wide -- thus peak hours for us (Pacific Northwest) is night-time for other geographies so how would one define 'off-peak hours'? By geography? By highest average Server use time? As you can see, this can vary quite a bit depending upon the company.
Is there a specific reason why you're worrying about these items? Given the limited information provided, they really shouldn't be a concern; if anything, simply monitor usage and make changes when the time comes.
It can be beneficial to run extracts during off peak hours if you have a lot of them to process, because it can impact users if the Background tasks are running on the same hosts as all the other processes that are directly responsible for a good user experience (they are CPU-intensive). If you have daily extracts you want to have run at night instead of during the day, just create a new Schedule of type extract refresh, for say, 1AM or what have you, then move the extract refresh Tasks to the new schedule. If you aren't the owner of those workbooks or datasources, you should let the owners know that you wish to change the time that they run before actually making the change.
There's no way that I'm aware of to limit extract size. My tactic on preventing inordinate levels of resource consumption for extracts is to do a weekly check to see who published anything larger than 100MB or so, then look at their workbook and have them add filters and hide unused fields. That ends up saving quite a bit of space. But prevention is better than the cure, so try your best to get the word out on what you consider to be best practices when publishing to Server.
Thank you Matthew and Toby for you insight. I did some research and I touched base with Tableau Software about my question and this is what I found out. As you both have stated, extracts should be scheduled during off peak hours to help performance during peak hours. However, I also found out that extracts should be prioritized based on frequency. For example, if a extract runs hourly then this should be priority 1, daily priority 2, weekly priority 3 and so on. If the extracts are scheduled with the same frequency then it is best to find out the user community consumption of the Tableau view to determine its priority. For example, if the Sales extract that populates the Sales Dashboard is accessed by 100 users, but the Inventory extract populates the Inventory Dashboard is
accessed by 50 users then the Sales extract will get a higher priority. Please note, this practice is not etched in stone but it is a foundation in this type of scenario, but more details may be required based on the business need. If it takes a extract a long time to run then it is best to give the faster running extracts a higher priority so they can complete and open more resources for the longer running extracts.
Hi Nikole Phillips In our setup I setup schedules for all the extract refresh to finish before business hours ,I have schedules from morning 4:30 ,5:30 till 8:00 . Basic Idea is to avoid BG services consumption during business hours.
I want all the extract refresh done before 9:00 AM(our business start hour)
Regarding size limit we setup standard size of <3gb for both datasource/workbook.
2 of 2 people found this helpful
You are right about extract refreshes. In a nutshell, schedule refreshes during off-peak hours and prioritize according to refresh frequency. Here's the long detailed explanation of general best practice:
Extract refreshes and subscription schedules will be executed in the following order:
- All tasks currently in process will complete first.
- Tasks with the highest priority (lowest number) will be taken next, regardless of how long they have been waiting. For example, a task with a priority of 49 will be executed before a task with a priority of 50, even if the task with a priority of 50 has been waiting longer.
- If all tasks have the same priority, tasks will be executed in the order they were queued; the task scheduled with the earliest time stamp will be executed first.
- When multiple tasks with the same priority are scheduled to run at the same time, they will be executed in the following order:
- All extract refreshes in the order that they were created or enabled.
- All email subscriptions in the order they were created or enabled.
- Tableau Server can only run as many tasks concurrently as there are backgrounder processes configured in that Tableau Server environment.
- Separate extract refreshes for the same data cannot run simultaneously.
Note: This list only covers extract refreshes and subscription schedules, and does not consider other tasks, such as reap extracts.
Imagine every refresh has the default priority of 50 (#3 above). Here’s what the queue may look like:
Hourly refresh job1 Priority 50
Hourly refresh job2 Priority 50
Daily refresh job3 priority 50
Daily refresh job4 priority 50
Weekly refresh job1 priority 50
Daily refresh job5 priority 50
Daily refresh job6 priority 50
Monthly refresh job1 priority 50
Daily refresh job7 priority 50
Daily refresh job8 priority 50
Hourly refresh job3 priority 50
So it may be more than an hour before backgrounder can get from hourly refresh job2 to job3.
So best practice is to have the most frequently refreshed extracts assigned the highest priority, .e.g. 15 min refresh = priority 1, hourly = priority 5, daily = priority 10, etc.
Bookmark this one Tableau Server Admins this is really useful! Time to tidy our extract refreshes!
2 of 2 people found this helpful
One other method of scheduling extract refreshes that is useful to know is tabcmd runschedule.
With this command-line call, you can trigger the execution of a Tableau Server extract refresh schedule according to when it is optimal to refresh them, say, at the end of an ETL cycle for your data. For example, we pull down incremental changes to our Salesforce data into a SQL Server database as often as possible (this turns out to be every 25 minutes or so), perform some custom tranformations to it, then, within the SQL Server Agent Job, we make the a call to run the Tableau Server schedule responsible for refreshing the most critical and time-sensitive data extracts we have that are based on the Salesforce data.
Doing this allows us to avoid refreshing our data extracts at inopportune times, as would likely happen if we refreshed them on a simple time-based schedule.
Thank you everyone for your feedback. It is so useful
John, dang, thank you for the details, that is so helpful. Think you could saunter over to the technical writers and "accidentally" drop a copy of that on the desk of the person writing up the Admin. manual?
Absolutely useful information for a developer as well on how to lean your data extracts.
I was wondering...
How does Tableau Server extracts behave when they are coming from different sites on the server, e.g. site1, site2 as they will have their own set of extracts.