13 Replies Latest reply on Nov 27, 2013 2:39 AM by Bora Beran

    Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau

    colum.mccoole

      I've been trying to see how much functionality which I'm used to doing within excel is portable across to Tableau, so that I can bring a whole solution within Tableau.

       

      The structure of the data is very basic (as per the DataRaw) tab in the attached excel ... a date, Type (Series or Benchmark), Field (description) and Value. I'm not sure whether my approach is the correct one or not, but I've used various calculated fields to split this Value into either Benchmark or Series and then align both time-series (in this basic example) in order to generate various correlation calculations.

       

      While I can get the full series correlation working (yellow cell in screenshot) ... anything involving a subset of that timeframe is not working (where I am trying to control the 'WINDOW_*' using start and end optional settings.

       

      I have been trying to follow best practices in terms of going into the edit table calculation dialogue 'advance' and setting calculations to run along WeekEnding.  In my interim calculations to get WoW absolute and percent changes, I used ascending sort against WeekEnding in order to get the right calculated data aligning ... complicated partly by the fact that I am showing the table here ordered by newest to oldest.

       

      When editing a table calc for the correlation calculation against setting it to calculate against 'WeekEnding', it doesn't seem to matter the sort order in the 'advanced' tab ... for the main calculated field or component ones.  Is this normal?

       

      correl_question.PNG.png

      The above is a basic example of a single correlation analysis.  In the attached *.twbx (and screenshot below), on the continued theme of moving full functionality from excel, I'm also trying to see the best approach to building out a heatmap that takes the latest correlation data for lots of Benchmark/Series pairs and perhaps augments this with some data on max/min values and dates which would be generated within Tableau (rather than imported as meta data from outside).  I have read on the forums in terms of certain limitations on generating a rank on data output from Table Calculations ... but unclear as to whether there is a work-around for this.

      matrix2_commented.PNG.png

      Finally .. if anyone has pointers on the regression/correlation panel / slope graph ideas I elaborate on inside this twbx, that would be ideal.

       

      Thanks,

      Colum

        • 2. Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
          Scott Tennican

          Hi Column,

           

          I usually break down statistical computations into a stack of simple calcs to allow me to catch errors.

          I've attached your workbook with an additional worksheet with a table calc that gives you the correct rolling 52 week correlation.

          I used what I consider an easier formula for correlation:

          correlationFormula.jpg

          Notice that I've stacked six very simple table calcs to get the final correlation in corrSB.

           

          Scott

          1 of 1 people found this helpful
          • 3. Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
            colum.mccoole

            Scott,

            Thanks for pointing out this alternative formula, which seems to work better.  Jonathan, thanks for bringing this to the attention of the right person.

            I still can't get my head around why this previous formula might not work.  If it works for the whole data-series, then surely it must function for a windowed portion.

            old_52_rolling.PNG.png

            Even using the modified formula, I'm still having trouble scaling this calculation across multiple benchmark/series combinations. In the base calculation, I was explicitly filtering on a single benchmark and single series.

            bmark_series_simple.PNG.png

            Ideally, I want to be able to generate a matrix of rolling correlations for various benchmark/series pairs over time. In this example, it would be a 3 x 3 matrix, giving 9 different correlation measures per date.

             

            multi_series1.PNG.png

            I tried the following ... but once again the Table Calcs have defeated me!  Or perhaps my approach is wrong ... but I was essentially looking to bring the relevant series side by side. Perhaps in an ideal world, this would get done back at the data source, but as per my first entry, I'm trying to see how an analysis can be built out assuming a single 'Value' field as a starting point.

             

            Here, I'm trying to derive a new series Benchmark_Fill, which is meant to fill-down values from benchmark based on being a member of a group ... and aligning these with the relevant series data ... so a requisite Corr52Roll_New_Fill can be calculated. This would hopefully then allow me to get towards the matrix in the 'Matrix/Ranking' tab I'm after ... and might get me closer to being able to display historical correlations for each of the 9 measures above in a panel chart.

             

            multi_series2.PNG.png

             

            Anyway, thank you both again for your efforts.

            Colum

            • 4. Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
              Scott Tennican

              Hi Colum,

               

              I may be unclear on your goal perhaps because I'm confusing it with your intermediate goals.

              It sounds like your most important goal is a 3x3 matrix of correlation coefficients for each WeekEnding date.

              Is that right? And, you want it color coded by something?

              If so, which correlations do you need in the matrix?

              And, by what measure would you like to color code?

              I'm not sure if this is possible. But, I'll give it a shot once I'm sure what you need.

              My approach would probably be to simplify the problem to it's basics and then scale up.

              For example, a benchmark plus two series 10 weeks long computing 4 week rolling correlation into 2x2 matrices.

               

              In general, we are trying to evolve Tableau to make simple computations effortless as they are when you turn on the summary statistics card, reference lines, trend lines or forecasting. While we aim to compliment rather than replace statistical packages, it would be better if displaying simple bi-variate statistics did not require table calcs.

               

              Scott

              • 5. Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                colum.mccoole

                Hi Scott,

                Thanks for persisting with this.

                I just used the 3 x 3 matrix (3 benchmarks against 3 related series) as an arbitrary example.

                In fact, I have >100 series and let's say each series is associated with one of 10 given benchmarks ... which altogether form the block of data ... data, Field (series and benchmark names), Type (to distinguish a benchmark from a series) and Value. You were able to help me get a basic case working ... ie. 1 benchmark v. 1 series on a 52 wk rolling basis.

                 

                Ultimately, I would like to be able to calculate all correlations at any given date (looking back) for a large combination of benchmarks/series ... and by extension (given that I am calculating the correlations), be able to display these calculated correlations in a panel chart .. so that I can how the correlation has evolved for all pairs over time.

                 

                To have the matrix in some sort of heatmap ... green for high and red for low would be a bonus ... but even to be able to sort by highest to lowest in the matrix would be enough.

                 

                Thanks again,

                Colum

                • 6. Re: Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                  Scott Tennican

                  Hi Colum,

                   

                  So, it seems that you would like to assemble the correlation coefficients between multiple different series into one Column which can then be sorted, color encoded or structured as a matrix by group and field categorical values associated with particular rows.

                  In other words, you'd like the table of data on which your Viz is based to be replaced by a differently dimensioned table of bi-variate summary statistics of some subset of the possible pairs of series in your Viz.

                  However, Tableau has limited flexibility with regard to this "result set" table for a particular worksheet.

                  You can add another column to the table using a calculated field. But, you can't calculate new rows (except by forecasting).

                  A calculated column is always the same length as other columns in the table and its rows are still associated with rows in other columns depending on the values of the rows of categorical columns.

                  So, you can add a new correlation Column for each pair of series. But, you really want all these values in a single Column with another couple categorical Columns whose values distinguish which rows are the correlations for which pair of series.

                   

                  I think this is impossible. But, Jonathan Drummey can sometimes make Tableau do what seems impossible.

                  Do you see a way forward here, Jonathan?

                   

                  Scott

                  • 7. Re: Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                    Jonathan Drummey

                    Thanks for the compliment, Scott! I'll try to live up to it!

                     

                    Colum,

                     

                    Based on your data I think most, maybe all of what you want done is accomplishable. But first, a caveat: I know a lot about table calcs, and very little about statistics, so it's hard for me to validate results of statistics table calculations. Even "make this number come out as X" can be hard when we're working with nested, windowed table calcs, so I'd recommend we set up a webex, my email is jonathan (dot) drummey (at) gmail.

                     

                    I can see at least one problem in the original _CorrelCoeff_Chg_52_Wk_alt calc: The addressing is set to 0,51, which means "go from the current address to 51 addresses ahead" (where an address is a "row in the partition"). I'm presuming that you're wanting to go retrospectively, which means it should be -51,0 for each of the calcs. I'm not sure about how the addressing is done for the rest of the calcs used in that calc (that's one place where we'd work together).

                     

                    For the multi-set correlation, I couldn't work out from your initial table how the calcs should work.

                     

                    In the Multi-set Correl 2, your drawing was enough for me to go on for that. The way you initially built the Benchmark_Fill calc, it's set up to use a Table(Down) configuration where there is only one partition, and then the calc is attempting to do it's own partitioning. This is way more work and tricky to maintain because the results of a table calc with Table (Down) addressing are relative to the pill arrangement. Also, in the table the nested Compute Using of the Benchmark_Chg_Pct inside the Benchmark_Fill doesn't match the WeekEnding Compute Using of the Benchmark_Chg_Pct calc that is in the view. While this is a very powerful feature of Tableau (the same calc can have a different addressing & partitioning arrangement for each instance in a view), I'm thinking in this case it would probably cause the Benchmark_Fill calc to be receiving incorrect results. (This is one of the really challenging bits of table calcs, we use a dialog box to set up some really complex calculations so we can't see what's going on except in our mind's eye.)

                     

                    In the "Multi-set Correl jtd" worksheet I created a "Benchmark_Fill jtd" calc that correctly propagates the Benchmark_Chg_Pct to the other rows. Then I created a "Corr52Roll_New_Fill jtd calc" that uses the -51,0 addressing and references the new Benchmark_Fill calc, and set that up with what I think is the correct addressing.

                     

                    Panel charts are certainly possible, the exact technique will depend on the data.

                     

                    The easiest way to do a slope chart is to set up a discrete date on Columns and then pick two dates, I set this up in the "Simple Slope" in the attached.

                     

                    Hope this helps!

                     

                    Jonathan

                    • 8. Re: Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                      Scott Tennican

                      On the other hand, I know quite a bit about statistics and much less about Table Calcs.

                      So, I'd be happy to help check your numbers once you have them in the right spot.

                      I checked my previous solution in R like this:

                       

                      dat = read.csv( "dataSB.csv", header=TRUE, sep="," )

                      S = dat[,"S"]

                      B = dat[,"B"]

                       

                      for ( i in 1:5 ) {

                          range = i:(i+51)

                          n = length(range)

                          rS = S[range]

                          rB = B[range]

                          meanS = mean(rS)

                          meanB = mean(rB)

                          sdS = sd(rS)

                          sdB = sd(rB)

                          sumSB = sum(rS*rB)

                          corrSB = (sumSB - n*meanS*meanB)/((n-1)*sdS*sdB)

                          print(corrSB)

                          print(cor(rS,rB))

                      }

                      • 9. Re: Re: Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                        colum.mccoole

                        Hi Jonathan,

                        Thanks for this ... (and thanks Scott for the R stuff as a cross-check ... I'm doing it long-hand in excel ... but granted, R is certainly a more efficient means).

                        Your Benchmark_Fill_jtd calculated field has helped me get over a bump and I think I have resolved various issues. I reverted back to using the 'advanced' tab in 'edit table calc' dialogue box and assigned along WeekEnding ascending for the main formula and its two component Chg_Pct components and along Field for the Benchmark_Fill_jtd.

                         

                        Corr52Roll_New_Fill_jtd.PNG.png

                        You referred to _CorrelCoeff_Chg_52_Wk_alt calc earlier ... this was known to be broken .. I should have mentioned that. What was working for me were the four right-most fields in corr52WkFixed. I think where I became confused with the (0,51) versus (-51,0) start/ends used in my various formulas arose from the fact that I was working with a table in descending order (by WeekEnding) and to get the Chg_Pct fields working initially ... I was playing around with ascending/descending settings in the advanced tab. Therefore some of my calculations are a mixture of descending/ascending on component calculated fields, which I think was screwing up the calculation when it came to setting start/end offsets in the finished calculation. The Corr52Roll_New_Fill_jtd above is a cleaner approach, where all component fields are ascending.

                         

                        Matrix New contains the rolling 52wk correlation by series, by date. I have referred to your recent 'The Next N Table Calculations' to see how I can effectively filter out the parts I don't want ... without affecting the table calculation (ie. post table calc filter)... but I can't seem to get what I want ... ideally I want to be able to filter to a particular date (still having all dates in the background for the calculation) and get rid of the benchmark (which needs to remain in the background for the calc) and also be able to sort the correlations ... I tried the discrete pill version of the correlation calc field, but that doesn't appear to work in this case. Is there any way to rank these calculations within a given date?  I guess the problem here is that the Group needs to remain ... as this is part of how the Benchmark gets assigned to each given series ... upon which the correlation calculation is built.

                        sort_filter_matrix.PNG.png

                        I also managed to get the panel chart broadly working ... although again, not without its faults.

                        Ideally I would like to be able to order the panel by the highest to lowest latest (or current) correlation observations ... which would force any blank panes to the bottom. I can't figure out how to do this, without impact the correlation calculations.

                        panel_new.PNG.png

                        Finally, thanks for the simple slope chart solution ... however, I am finding that the WeekEnding filter you have here is not working for the Corr52Roll_New_fill jtd field ... since it's filtering out the dates before the table calculation, I think, when I need it to do so after ... I tried some of your tricks like LOOKUP(MIN([Weekending]),0) and max version and TOTAL(MIN([WeekEnding])) and max version .. but neither appear to work ... what I have in Slope New are the right values ... I just need a way (post the table calcs) to suppress all values rendering other than the start and end dates.  It seems tricky to achieve this, but you gents are clearly more accomplished than me in this department.

                         

                        I attach the latest version with udpates.

                        Colum

                        • 10. Re: Re: Re: Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                          Jonathan Drummey

                          Hi Colum,

                           

                          See the attached. I used table calc filters to hide data without changing results for the Matrix New jtd and Slope New jtd worksheets. One tip - If a table calc appears to be changing results, it probably means that the compute using of the table calc is not correctly set, or you're getting densification behavior (stuff Tableau does to try to help us out, like padding out a crosstab view) that you didn't expect.

                           

                          We can't sort a dimension by a table calc, but what we can do is use Tableau's default alphanumeric sorting. I put a copy of the Corr52Roll_New_Field jtd calc as a discrete on Rows in the Matrix New jtd worksheet to get a sort, then turned off Show Header.

                           

                          I didn't tackle the Panel chart, when I looked at it the first thing that struck me is that the compute usings of the Corr52Roll_New_Field jtd calc don't appear to be correct. I'll have some time to look at it this weekend or earlier next week. I'm not sure about the sorting or getting rid of the blanks, what you're doing is table calcs on top of table calcs and options are more limited than when we're dealing with dimensions and regular aggregates.

                           

                          Jonathan

                          • 11. Re: Re: Re: Re: Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                            colum.mccoole

                            Jonathan,

                            Thanks for your help with the last few things.

                            I was able to fix the panel by adding an IF statement to preclude it from calculating a correlation with less than 52 weeks of data. I fixed the blank panel issue by filtering out for nulls. This was unset ... but I ticked, applied, then unticked for it to take effect. I'm reasonably confident the rolling correlation is generating the right numbers and have checked against excel.

                            panel_fixed.png

                            For the slope chart, I took away the weekending filter, as a problem was arising where it was offering dates in the weekending filter that potentially within less than 52 weeks of the base date ... and so calculating an incorrect correlation. The IF THEN X ELSE NULL applied to the Corr52Roll_New_Fill jtd field will just plot a blank for any dates without at least 52 weeks of data, which is fine.

                             

                            Thanks also for the discrete/alphanumeric sort workaround.  I feel like I've reached base camp of the table-calculation expedition ... thanks once again.

                            Colum

                            • 13. Re: Replicating Excel CORREL functions in Tableau and better leveraging LOOKUP in Tableau
                              Bora Beran

                              I wrote this a while back and it seems very relevant to your use case so I thought I should share.

                               

                              Very similar layout and I am calculating correlation coefficient and slope of the regression line for each cell in the grid using table calcs.

                               

                              Trellis Charts with Tableau 8

                              trellischarttableau1.png