11 Replies Latest reply on Nov 5, 2015 3:28 PM by Aalok Jain Branched to a new discussion.

    Suggestions on Prioritizing Extract Refreshes

    Nikole Phillips

      Hello,

      Does anyone have any best practices on prioritizing extract refreshes during off-peak hours?  What about extract size limits?  Please let me know.  Thanks.

        • 1. Re: Suggestions on Prioritizing Extract Refreshes
          Toby Erkson

          So far we've had no need for extract size limits.  Our largest is about 1GB but the rest are smaller in comparison.  I don't want to impose limits unless it seriously impacts our Server.  Even then the issue is generally fixable by reminding them to not include every column they don't need/use, encourage use of native connections and to shy away from Custom SQL.

           

          Not sure I understand the need for "prioritizing extract refreshes during off-peak hours".  Extracts depend on how 'fresh' the data needs to be for a report and the variety of groups.  Where I work, Tableau is available corporate-wide -- which means world-wide -- thus peak hours for us (Pacific Northwest) is night-time for other geographies so how would one define 'off-peak hours'?  By geography?  By highest average Server use time?  As you can see, this can vary quite a bit depending upon the company.

           

          Is there a specific reason why you're worrying about these items?  Given the limited information provided, they really shouldn't be a concern; if anything, simply monitor usage and make changes when the time comes.

          • 2. Re: Suggestions on Prioritizing Extract Refreshes
            Matt Coles

            It can be beneficial to run extracts during off peak hours if you have a lot of them to process, because it can impact users if the Background tasks are running on the same hosts as all the other processes that are directly responsible for a good user experience (they are CPU-intensive). If you have daily extracts you want to have run at night instead of during the day, just create a new Schedule of type extract refresh, for say, 1AM or what have you, then move the extract refresh Tasks to the new schedule. If you aren't the owner of those workbooks or datasources, you should let the owners know that you wish to change the time that they run before actually making the change.

             

            There's no way that I'm aware of to limit extract size. My tactic on preventing inordinate levels of resource consumption for extracts is to do a weekly check to see who published anything larger than 100MB or so, then look at their workbook and have them add filters and hide unused fields. That ends up saving quite a bit of space. But prevention is better than the cure, so try your best to get the word out on what you consider to be best practices when publishing to Server.

            • 3. Re: Suggestions on Prioritizing Extract Refreshes
              Nikole Phillips

              Thank you Matthew and Toby for you insight.  I did some research and I touched base with Tableau Software about my question and this is what I found out.  As you both have stated, extracts should be scheduled during off peak hours to help performance during peak hours.  However, I also found out that extracts should be prioritized based on frequency.  For example, if a extract runs hourly then this should be priority 1, daily priority 2, weekly priority 3 and so on.  If the extracts are scheduled with the same frequency then it is best to find out the user community consumption of the Tableau view to determine its priority.  For example, if the Sales extract that populates the Sales Dashboard is accessed by 100 users, but the Inventory extract populates the Inventory Dashboard is

              accessed by 50 users then the Sales extract will get a higher priority.  Please note, this practice is not etched in stone but it is a foundation in this type of scenario, but more details may  be required based on the business need.  If it takes a extract a long time to run then it is best to give the faster running extracts a higher priority so they can complete and open more resources for the longer running extracts.

              • 5. Re: Suggestions on Prioritizing Extract Refreshes
                Sunil Tikar

                Hi Nikole Phillips  In our setup I setup schedules for all the extract refresh to finish before business hours ,I have schedules  from morning 4:30 ,5:30 till 8:00 . Basic Idea is to avoid BG services consumption during business hours.

                I want all the extract refresh done before 9:00 AM(our business start hour)

                 

                Regarding size limit we setup standard size of <3gb for both datasource/workbook.

                • 6. Re: Suggestions on Prioritizing Extract Refreshes

                  Nikole,

                   

                  You are right about extract refreshes. In a nutshell, schedule refreshes during off-peak hours and prioritize according to refresh frequency. Here's the long detailed explanation of general best practice:

                   

                  Extract refreshes and subscription schedules will be executed in the following order:

                  1. All tasks currently in process will complete first.
                  2. Tasks with the highest priority (lowest number) will be taken next, regardless of how long they have been waiting. For example, a task with a priority of 49 will be executed before a task with a priority of 50, even if the task with a priority of 50 has been waiting longer.
                  3. If all tasks have the same priority, tasks will be executed in the order they were queued; the task scheduled with the earliest time stamp will be executed first.
                  4. When multiple tasks with the same priority are scheduled to run at the same time, they will be executed in the following order:
                    • All extract refreshes in the order that they were created or enabled.
                    • All email subscriptions in the order they were created or enabled.
                    • Tableau Server can only run as many tasks concurrently as there are backgrounder processes configured in that Tableau Server environment.
                  5. Separate extract refreshes for the same data cannot run simultaneously.

                  Note: This list only covers extract refreshes and subscription schedules, and does not consider other tasks, such as reap extracts.

                   

                   

                  Imagine every refresh has the default priority of 50 (#3 above). Here’s what the queue may look like:

                   

                  Hourly refresh job1 Priority 50
                  Hourly refresh job2 Priority 50
                  Daily refresh job3 priority 50
                  Daily refresh job4 priority 50
                  Weekly refresh job1 priority 50
                  Daily refresh job5 priority 50
                  Daily refresh job6 priority 50
                  Monthly refresh job1 priority 50
                  Daily refresh job7 priority 50
                  Daily refresh job8 priority 50
                  Hourly refresh job3 priority 50

                  So it may be more than an hour before backgrounder can get from hourly refresh job2 to job3.

                  So best practice is to have the most frequently refreshed extracts assigned the highest priority, .e.g. 15 min refresh = priority 1, hourly = priority 5, daily = priority 10, etc.

                  2 of 2 people found this helpful
                  • 7. Re: Suggestions on Prioritizing Extract Refreshes
                    Eric McDonald

                    Bookmark this one Tableau Server Admins this is really useful! Time to tidy our extract refreshes!

                    • 8. Re: Suggestions on Prioritizing Extract Refreshes
                      Matt Coles

                      One other method of scheduling extract refreshes that is useful to know is tabcmd runschedule.

                       

                      With this command-line call, you can trigger the execution of a Tableau Server extract refresh schedule according to when it is optimal to refresh them, say, at the end of an ETL cycle for your data. For example, we pull down incremental changes to our Salesforce data into a SQL Server database as often as possible (this turns out to be every 25 minutes or so), perform some custom tranformations to it, then, within the SQL Server Agent Job, we make the a call to run the Tableau Server schedule responsible for refreshing the most critical and time-sensitive data extracts we have that are based on the Salesforce data.

                       

                      Doing this allows us to avoid refreshing our data extracts at inopportune times, as would likely happen if we refreshed them on a simple time-based schedule.

                      2 of 2 people found this helpful
                      • 9. Re: Suggestions on Prioritizing Extract Refreshes
                        Nikole Phillips

                        Thank you everyone for your feedback.  It is so useful

                        • 10. Re: Suggestions on Prioritizing Extract Refreshes
                          Toby Erkson

                          John, dang, thank you for the details, that is so helpful.  Think you could saunter over to the technical writers and "accidentally" drop a copy of that on the desk of the person writing up the Admin. manual?

                          • 11. Re: Suggestions on Prioritizing Extract Refreshes
                            Aalok Jain

                            Absolutely useful information for a developer as well on how to lean your data extracts.

                             

                            I was wondering...

                             

                            How does Tableau Server extracts behave when they are coming from different sites on the server, e.g. site1, site2 as they will have their own set of extracts.