1 2 Previous Next 16 Replies Latest reply on Sep 14, 2016 7:27 AM by Toby Erkson

    Everyone wants their extracts to run first

    Phyllis Eanes

      Suggestions on how to manage users all selecting the 8:00 extract refresh schedule. Everyone wants to have their data run first in the mornings. Has anyone else encountered this?

        • 1. Re: Everyone wants their extracts to run first
          Jeff Strauss

          How many extracts are you talking?


          There are a few strategies of how to go about handling this:


          1.  Can you open up more backgrounders to handle the refreshes?


          2. Can any of them run earlier than 8am?  Schedules can be triggered externally via tabcmd runschedule post any underlying source tables are updated.


          3. Can you organize by mission critical / non-mission critical and split into two schedules?


          4. Leverage published datasources that the workbooks point at.  This permits one refresh for the published datasource extract which then many workbooks can go and leverage without having to refresh.  It requires some upfront work to come up with a realistic well sized datasource that serves the needs of a specific subject area for workbooks, but once you have it, it creates a lot of harmony in the org.

          • 2. Re: Everyone wants their extracts to run first
            Phyllis Eanes

            We are going to add another backgrounder process, we are only using 1 at this time. We are meeting to discuss priorities regarding what jobs should be first.  As you indicated mission critical.

            We are waiting for our data warehouse to get updated and this makes it difficult to start any earlier.

            How many backgrounder processes are you running? We can go up to 8 I believe.





            • 3. Re: Everyone wants their extracts to run first
              Shawn Wallwork

              Is there something that happens a 8 a.m? A big data dump? If not why not start earlier? Do 15 minute schedules.



              • 4. Re: Everyone wants their extracts to run first
                Yuriy Fal

                Hi all,


                Regarding 3 (mission-critical vs general),

                please, don't forget about task prioritization:






                • 5. Re: Everyone wants their extracts to run first
                  Phyllis Eanes

                  Yes big data becomes available.





                  • 6. Re: Everyone wants their extracts to run first
                    Phyllis Eanes

                    Thanks, we are using prioritization and I think getting an extra backgrounder process will help.





                    • 7. Re: Everyone wants their extracts to run first
                      Jeff Strauss

                      good point on the prioritization, in addition to having multiple schedules, we use the prioritization internally within the schedule to order the tasks.


                      One item to note as you increase the # of backgrounders is that they are CPU intensive.  So if you are running within a single node cluster, you have to be careful to not increase the # of backgrounders too much or else viz rendering will be competing with backgrounders for resources which is really bad.  One of the recommendations that Tableau provides is to isolate backgrounders onto their own node with a distributed cluster.  This is what we have done, and right now we have 4 backgrounders running here and we could increase it if we wanted to.  Performance Tuning Examples 

                      • 8. Re: Everyone wants their extracts to run first
                        Toby Erkson

                        Good feedback.


                        The number of backgrounders (and other adjustable processes) depends on the number of cores you have.  Like Jeff said, it may not be a good idea to up the backgrounder count.  Move it up by one and see if that makes a difference with extracts as well as how reports are brought up / rendered.


                        You can manually set the Priority of individual Tasks.  You can display your Schedules, click on a specific Schedule to see the Workbooks / Data Sources that execute under that Schedule, then click the ellipsis for the particular Task and select Change Priority...


                        Next, a dialog box will appear and you can alter the Priority of the selected Task:


                        If you want to view all Tasks then simply click on that and sort the list if desired and change Priorities:



                        Why would you do this?  To tweak who gets to go first when there is one schedule that many are using but some extracts are more important than others.  Like your situation   You could have two individual schedules that are triggered at the same time but if one has a higher priority then users will gravitate to that schedule if they find out it's default priority is better, leaving you back where you started.

                        So you could make your 8AM schedule have a default priority of 60 and then manually set the priority for each use of it based upon its importance -- yeah, good luck getting users to agree to that!

                        • 9. Re: Everyone wants their extracts to run first
                          Phyllis Eanes

                          Thanks to everyone – I’m meeting with our server DBA to discuss performance tuning – thanks for all of the great responses.





                          • 10. Re: Everyone wants their extracts to run first
                            Toby Erkson

                            Phyllis Eanes,

                            Let us know what you end up doing

                            • 11. Re: Everyone wants their extracts to run first
                              Phyllis Eanes

                              Will do!





                              • 12. Re: Everyone wants their extracts to run first
                                Tom W

                                I dealt with this scenario a lot back in my days of Business Objects and a slow running data warehouse refresh process. Business Objects took all the heat for 'being late' when it was the long running ETL processes holding it back. It took a while to get that point across to the business users but once we did, we were able to get some budget allocated to speeding up our ETL.


                                I think your big dependency here is, can you consolidate your extracts into shared and published datasources? If people are running effectively the same extracts over and over, you're wasting time. Break down what people are doing, implemented published datasources if possible.

                                If you're able to implement published datasources, my next consideration would be the start time of 8AM. For long running processes like we had with our Data Warehouse Business Objects was hanging off, we would get the ETL process to kick out a flag when certain parts of the DW were refreshed. That way Business Objects knew it could start refreshing certain cubes rather than a fixed time of 8AM.

                                In Tableau, I could see this working by getting the ETL process to kick off a script on completion to use TABCMD to refresh your published datasource. That way if your ETL process finishes early at 7:30, you've benefited from 30 minutes of extra processing time and those minutes can be the difference between buying extra servers and not.


                                Tech can only take you so far though. We had to put in a pretty big process change to prioritize 'standard' reports over adhoc reports and get the business to consolidate their reports into a suite of shared and standard reports which were prioritized higher than adhoc refreshes. It helped us a lot when we sat them in a room with the necessary data ("here's a list of your reports running at 8am") and made them fight it out among themselves to justify why this logistics report was more important than that marketing report. In the end 1 of 2 things will happen - you will either have harmony among your BU's and they will give each other the necessary priority or a great business case for more server resources!

                                • 13. Re: Everyone wants their extracts to run first
                                  Brian Mooneyham

                                  If any/all of you are planning to come to TC16, we're planning to discuss this as one of our BIG topics!  I've found a way to optimize extracts/schedules/priorities.  Below is a link to our session.

                                  Learn | Tableau Conference 2016

                                  • 14. Re: Everyone wants their extracts to run first
                                    Toby Erkson

                                    Dude, I am so there!  Thanks

                                    1 2 Previous Next