14 Replies Latest reply on May 2, 2012 1:19 PM by Richard Leeke

    Rolling Means: What's the Best Method?

    Emma Whyte

      Hello,

       

      We have a table with approximately 1,500 records containing satisfaction survey data. We are trying to create rolling means for some of the survey questions which are answered on a numeric scale, we have succeeded in this but it takes Tableau a long time to run. More specifically it is taking the desktop client a long time to display the results or edit the graph; Tableau Server is barely taxed at all. This leads us to believe that the calculation is at fault and is being performed by the client rather than the server.

       

      The calculation we are currently using is:

       

      Window_Sum(sum([Q33]),-11,0)/Window_sum(count([Q33]),-11,0)

       

      This is the total score for a 12 month period / total number of responses for a 12 month period. We verified this against a manual calculation and the results are correct.

       

      We have had a look on the forum and found a post by Joe Mako for a Rolling Median and attempted to repurpose it. It involved first creating a Cartesian product of the data set against the dates from the data set http://community.tableau.com/thread/114411. Our version of that is as follows:

       

      IF FIRST()==0 THEN

      TOTAL(AVG(

      IF DATEDIFF ('month', [date], [compare date]) < [Moving Months]

      AND [date] <= [compare date] THEN

      [Q33_01]

      END))

      END

       

      This executes very quickly and produces something very similar, but the numbers are wrong. On a hunch we modified the custom SQL that creates the Cartesian product so that it ignores nulls in date and compare date. However the numbers we get are now the averages for each month rather than a rolling mean.

       

      What we are asking is how do you guys go about computing a rolling mean (moving average)?

        • 1. Re: Rolling Means: What's the Best Method?
          Richard Leeke

          Hard to comment on this without seeing the workbook (with dummy data as appropriate).

           

          My first guess would be that you might not have the partitioning set correctly for the TOTAL() - from what you have here it looks as if Compute Using should be set to [date].

          • 2. Re: Rolling Means: What's the Best Method?
            Emma Whyte

            Hi Richard,

             

            Thanks for your reply. Please find attached a packaged workbook and dummy data as requested.

             

            We have already tried changing the Compute Using to [date] and [compare date].  [date] provided us with a blank graph (no line) and [compare date] gives us the average as a line, not a rolling mean.

             

            Any other suggestions are very welcome!

             

            Emma

            • 3. Re: Rolling Means: What's the Best Method?
              Richard Leeke

              Can you post a packaged workbook showing where you have got to - it's always much easier to help people find their problems if we have something to start with rather than just a blank workbook and some data.

               

              If you can set up the dummy workbook with the self-join and the table calculation I'll take a look.

              • 4. Re: Rolling Means: What's the Best Method?
                Emma Whyte

                Hi Richard,

                 

                I've updated the packaged workbook as requested.

                 

                What we have noticed is that the more sheets we have in each workbook with rolling mean calculations the slower the workbook gets. We are trying to calculate rolling means in such a way that doesn't tax Tableau Desktop quite so much.

                 

                Thanks

                 

                Emma

                • 5. Re: Rolling Means: What's the Best Method?
                  Richard Leeke

                  Hi Emma

                   

                  So if I understand correctly, what you've shown in the updated packaged workbook is an example of how you are currently doing it - which gives the correct answer, but too slowly. Is that right? (I'm just trying to make sure I'm understanding the right issue.)

                   

                  Your original post said that you have around 1500 rows, which is about the same as your sample data. the sample is very quick - but I assume that is because the dummy data only has two question columns. I see that your examples of the different calculations reference columns [Q33] and [Q33_01]. I assume that you have large numbers of columns and that you are repeating the calculation for each column? Is that the structure? How many questions in the data source and how many are you including in the view? That's the only place I can see the multiplier that's making it too slow could be coming from. It might be that restructuring your data source may be another option.

                   

                  I was really hoping that you would put the example together with the cross product and the variant of Joe's calculation so I could take a look at why you are getting the wrong answer. I'm not going to spend the time reworking the data source into that format - but if you post a version in that format I'll certainly take a quick look for you.

                  • 6. Re: Rolling Means: What's the Best Method?
                    Emma Whyte

                    Hi Richard,

                     

                    Yes you are understanding correctly. The packaged workbook includes how we are currently calculating rolling means. But the workbook becomes unusable as it has slowed down so much.

                     

                    The sample includes all the rows but only a selected number of columns. I cannot give any more unfortunately. The original has 103 columns and we need rolling means on 24 of these.

                     

                    I can speak to our database administrator and see if we can get Joe's rolling median version we were trying in a packaged workbook.

                     

                    Thanks

                    • 7. Re: Rolling Means: What's the Best Method?
                      Richard Leeke

                      I'm very puzzled why that should slow down so much - 1500 rows is tiny, and you are aggregating it so the table calculations are working across a very small number of summary rows, so I really wouldn't expect it to be that slow.

                       

                      I might just try duplicating the columns you provided that many times and mock something up.

                       

                      How is the view structured when it is performing so badly? Do you have all 24 rolling means on the view? Other dimensions in play? Is it a line chart like your example, or what?

                       

                      Sorry to keep firing back more questions, but without actually seeing the problem it's almost impossible to speculate what's going on.

                      • 8. Re: Rolling Means: What's the Best Method?
                        Emma Whyte

                        Hi,

                         

                        Yes that is what is puzzling us as well. We don't have all 24 rolling means on a single view. We do a view for each question, giving us 24 sheets with rolling means on. We also have 24 sheets with trend lines on in the workbook as well. We are planning on testing the performance by doing a workbook with trend lines and a workbook with rolling means. We are also speaking to Tableau Support regarding performance of Tableau Desktop and hardware/OS issues.

                         

                        What we really want to get down to is: are we calculating rolling means in the best possible way to enhance the best performance from Tableau Desktop?

                         

                        No problem for all the questions, thank you for all your help!

                        • 9. Re: Rolling Means: What's the Best Method?
                          Jonathan Drummey

                          Just to chime in with some anecdotal evidence, I have noticed that there is a point in adding table calcs (and maybe even non-table calcs) to a workbook where the performance of all views in the workbook starts slowing down. Alex Kerin also mentioned seeing this.

                           

                          And I'm not sure, but there seems to be a point where adding more worksheets also affects performance, but the thing is when I'm adding more worksheets I'm usually creating additional calculations for those new worksheets and I'm not sure what has more influence on performance. I spent all day yesterday with a new workbook and new data set and at the beginning of the day I had instantaneous UI response and after several hours of work adding about 25 calculated fields and 30 worksheets the UI is taking a second or more to do things like duplicate a field. Redrawing a view is also taking more time as well.

                           

                          Jonathan

                          • 10. Re: Rolling Means: What's the Best Method?
                            Richard Leeke

                            What would be really helpful in order to work out where the time is going is to have a look at the Tableau log file for the time when you are refreshing the view. Tableau support are probably already looking at that for you - but if you wanted to post a log (or just the snippet covering the period from starting to refresh the view to completing the redraw) I'll take a look.

                             

                            I really don't think it is the individual calculation that is the problem - it's that there's a bit of time added for each of the sheets.

                             

                            Are they just individual sheets, or is there a dashboard with lots of mini-views on? I could see that taking a while. If not, I really don't understand why having more sheets and more calculations defined in the workbook would make that much difference to the speed. But it's a very good point that Jonathan raises - put like that it does sort-of fit with my own experience, though I hadn't thought about it like that. I'm going to experiment with breaking out a few sheets from one of the monster workbooks I use all the time (and which is getting really sluggish even for things which seem as if they ought to be quick) and see if that makes them go more quickly.

                             

                            It is interesting that when you tried Joe's approach that was fast. Did you actually try a workbook with all 24 instances of Joe's way? If so, was that still fast? If so, a) it would be good to take a look at the Tableau logs for that as well and work out where the difference in time is from the slow one and b) it's worth following through to work out why that one is giving the wrong answer.

                             

                            Try to answer as many of the questions that I've littered through this as you can and ideally post a log file for the slow correct way and a log file for the fast but wrong way. We'll see where that leaves us.

                            • 11. Re: Rolling Means: What's the Best Method?
                              Emma Whyte

                              Hi Richard,

                               

                              Unfortunately Tableau Support have never asked me for a log, and i'm not sure how to get one for the time when i'm refreshing the view? I can see logs held in \My Documents\My Tableau Repository\Logs but none for when I open up a workbook? There are error logs, but not relating to this workbook.

                               

                              The response i've had from Tableau Support echo's Jonathans' suggestion. Support have suggested not having a workbook with more than 30 sheets in. Currently ours has 72. We are going to try splitting the workbook in to smaller workbooks to improve performance.

                               

                              We did not try a workbook with 24 instances of Joe's calculation. I think if we did it would slow performance.

                               

                              Can you suggest how I can post a log file for you?

                               

                              Thanks

                              • 12. Re: Rolling Means: What's the Best Method?
                                Richard Leeke

                                The log files I was referring to are just the files log.txt, log2,txt etc which you have seen in the Logs folder.  If you open a single instance of Tableau I think it always goes to log.txt; if you have multiple copies running at the same time they each write to a separate file.

                                 

                                So the easiest way to do it is to close all copies of Tableau, then open your workbook, step through a few of the sheets that are slow and then close Tableau. Then check which of the log files has been recently updated (it should just be log.txt, I think).

                                 

                                The log file doesn't contain any of your actual data, and I don't think there's anything sensitive in there - but it does contain things like queries and also file and directory names. So you might just want to have a look through it to make sure there's nothing in there that you don't want to expose publicly. Let me know if you'd prefer not to post it on the forum and we'll sort out a way for you to send it to me directly, if you prefer.

                                 

                                The log files do record everything you've done for the last few days - so you can limit the size and also avoid having anything to do with other workbooks by deleting (or renaming) the log file before you open the workbook. That way it will just create a new file which will be limited to the one session.

                                1 of 1 people found this helpful
                                • 13. Re: Rolling Means: What's the Best Method?
                                  Emma Whyte

                                  Hi Richard,


                                  Sorry for the late reply, other deadlines interrupted this project.

                                   

                                  After looking more closely at the workbook and speaking with Tableau Support we managed to solve our question with the following methods:

                                   

                                  1) We split the workbook up in to 3. Tableau Support suggested that 30 sheets be the maximum for most workbooks (depending on data size, calculations etc).

                                   

                                  2) We found that the person creating the original workbook had coloured the lines using Measure Names. We removed all instances of Measure Names in the workbook which improved performance drastically! The workbook was somehow filtering by Measure Names (with all dimensions selected) and using that to colour the trend lines.

                                   

                                  3) We have moved our installation of Tableau Professional on to a Server running Windows WebServer 2008, with more access to RAM, processor power etc.

                                   

                                  4) The Rolling Mean calculation we were using was correct. It is now performing much quicker since the changes above. We are also using a Tableau Server data extract with the Rolling Mean calculations already loaded, saving computing power.

                                   

                                  Thank you very much for your help and input. And if you ever need a rolling mean calculation, feel free to use the one above!

                                  • 14. Re: Rolling Means: What's the Best Method?
                                    Richard Leeke

                                    Thanks for posting back the explanation - it's really helpful when people do that so that someone else finding this thread in 6 months isn't left thinking the issue was never resolved.

                                     

                                    All makes sense and good suggestions.