9 Replies Latest reply on Nov 18, 2015 8:21 AM by Bora Beran

    Using calculations 'on the left' - LoD

    Mark Fraser

      This discussion started among the ambassadors but we thought it better for the discussion to be open and available to everyone...

       

       

      Why isn't this allowed?      I.e. Why aren't functions allowed 'on the left'?

      {FIXED YEAR([DATE]) : SUM([Sales])}

       

      but I can create this -

      [YEAR]:      YEAR([DATE])

       

      and then this -

      {FIXED [YEAR] : SUM([Sales])}

       

      Which works in exactly the same way, same output... but its 2 steps, 2 calculations?!

       

       

      PS. I'm also interested if / how to use a hierarchy dimension with LoD (i.e. making the LoD dynamic with the hierarchy)... should this be a separate thread?

       

      PPS. In LoD do the left and right 'sides' have names?

       

      Cheers

      Mark

        • 1. Re: Using calculations 'on the left' - LoD
          Simon Runc

          ....I think the answer is...'you just can't'!

           

          The exact 'technical' reason is probably a question for Bora Beran (or Jonathan who, from previous posts, has probably looked at the SQL sent when this LoD is sent, with a nested 2nd calculation).

           

          My guess would be, that (in it's current form) the FIXED LoD needs to have the 'Calculated Dimension', YEAR([Order Date]) in your example, run before it can execute the LoD. So by having the YEAR([Order Date]) in a separate calc, it knows to run that (create the dimension) and then use that in the LoD...And related to that, the way 'Row Level' Calculations, can be 'Optimised' in the TDE...basically meaning that Tableau creates a 'materialised' version of the result in the TDE (so its as if this calculation was written as part of a SQLview), but that FIXED LoD's aren't (even though the results are returned at row level)...again I seem to remember Bora hinting that, in future versions, FIXED could get 'materialised', which would be great (as currently I use them sparingly due to performance - or use them in PoC work, but get them 'programmed' into the database view for final version)....Not sure how they would get around the 'context' filter thing (meaning they would have to get calculated on the fly, if they had any context filter dependencies)...that's why they have smart guys, like Bora, to do the head-scratching on that!!


          I would like to see it change to allow 'Calculated Dimensions', and from Bora's presentation on LoD at TC15 (and the TDT) he did seem to be hinting that what we have are 1st Gen LoD, and so will be extended over time.

           

          On the naming convention...I've not heard of one, but would be good to have a consistent way of referring to them.

          • 2. Re: Using calculations 'on the left' - LoD
            Bora Beran

            This is not a limitation on the query generation side. Referring to YEAR(Date) through a calculated field in the data pane would yield the same results as using it directly as dimensionality in the LOD expression. It is just that when we were working on the first version of LOD, we thought that there is great value in the feature so we shouldn't delay it and instead add these incrementally. Supporting expressions in LOD dimensionality is in our todo list in addition to many other improvements.

             

            Would you not be able to rely on exclude or include in your scenario that involves hierarchies? We added those explicitly for drill down like scenarios.

             

            We call the left side : dimensionality expression

            And the right side : aggregate expression

             

            I hope this helps.

             

            Bora

            • 3. Re: Using calculations 'on the left' - LoD
              Rody Zakovich

              Hey Simon,

               

              So Tableau doesn't "Always" generate a sub-query for LoD expressions.

               

              We have the idea that only INCLUDE/EXCLUDE LoDs are vizLoD aware (when using them in the canvas [They are not vizLoD aware when used in things like SETS, because SETS are computed earlier in the Pipeline]. However, FIXED is also vizLoD aware, in regards to the query Tableau generates.

               

              From what I can see analyzing the Queries being sent back to the DB (And discussions with Jonathan), Tableau always tries to find the most optimal query to send back to the DB by looking at the vizLoD and the Calculation. If the vizLoD is at the same level as the LoD Calc, Tableau won't generate a sub-query. For example

               

              11-12-2015 8-53-35 AM.png

               

              Here my vizLoD is [State], and the LoD of my Calc is also [State]. So instead of sending back a nested query, Tableau simply sends back (Simple Form)

               

              SELECT [State], SUM([Sales])

              FROM [Table]

              GROUP BY [State]

               

              This is why, if the situation is right, Tableau generates the same query for FIXED LoDs and INCLUDE/EXCLUDE LoDs that compute at the same level (When no filters are at play and the vizLoD is the same). An example of this can be found in this thread.

               

              Re: Trying to get a COUNTD across a rolling span of months

               

              Bora, if I have miss-stepped here, PLEASE correct me!

               

              Thanks,

              Rody

              1 of 1 people found this helpful
              • 4. Re: Using calculations 'on the left' - LoD
                Simon Runc

                Thanks for that Rody...Yes I do, now, recall the example Jonathan gave, where he has looked the SQL being sent for that situation...clever Tableau (can't seem to find the post...to many order of operations/LoD threads!!)

                • 5. Re: Using calculations 'on the left' - LoD
                  Rody Zakovich

                  Hey Simon.

                   

                  I was thinking bout this some more, and wanted to throw out some ideas. I'm hoping Bora can clear some stuff up for me.

                   

                  I imagine there are a lot of issues with Materializing FIXED LoDs in the .tde file.

                   

                  1. You could potentially run into problems with aggregations happening at a Coarser level of detail than you actually want.

                   

                  For example

                   

                  11-12-2015 4-57-36 PM.png

                   

                  Here Tableau isn't actually joining the LoD query back to the datasource Record Level, rather it is generating a single sub-query of two temp tables. Something like

                   

                  SELECT [t0].[Customer Segment] AS [Customer Segment]

                    SUM(CAST([t1].[__measure__0] as BIGINT)) AS [sum:Calculation_5671112170217127:ok]

                  FROM (

                    SELECT [Superstore].[Customer Segment] AS [Customer Segment]

                      ([Superstore].[Customer Name]) AS [Customer Name])

                      [Superstore].[Customer Name]) AS [$temp2_Customer Name],

                      [Superstore].[Customer Segment] AS [$temp1_Customer Segment]

                    FROM [dbo].[Superstore] [Superstore]

                    GROUP BY [Superstore]. [Customer Name])

                      [Superstore].[Customer Segment]

                  ) [t0]

                    INNER JOIN (

                    SELECT SUM(CAST(([Superstore].[Sales]) as BIGINT)) AS [__measure__0],

                      ([[Superstore].[Customer Name]) AS [Customer Name])

                      [Superstore].[Customer Name]) AS [$temp3_Customer Name]

                    FROM [dbo].[Superstore] [Superstore]

                    GROUP BY [Superstore].[Customer Name])

                  ) [t1] ON (([t0].[Customer Name]) = [t1].[Customer Name])) OR (([t0].[Customer Name]) IS NULL) AND ([t1].[Customer Name]) IS NULL)))

                  GROUP BY [t0]. [Customer Segment]

                   

                  Tableau is returning this at the Level of Customer Name and Segment, ""Grouped by Customer Name and Segment"". If this was instead materialized record level, then our aggregations would be happening across every duplicate instance of Customer Name within a Customer Segment, because Tableau would just do

                   

                  SELECT SUM([Materialized Fixed LoD]), [Customer Segment]

                  FROM [Table]

                  GROUP BY [Customer Segment]

                   

                  You can see the big problem with this.

                   

                  2. When you optimize an extract, Tableau sends back row level calcs as a part of the Refresh Query. So, using the current method, Tableau would need to send back all of the FIXED LoDs as a part of the Refresh Query. That could certainly hammer you DB for quite some time. And what would it do for Incremental Extracts? My guess is that Tableau would need to create another layer to the Extract Optimization process that first runs the Refresh Query (without fixed LoDs) and the executes the FIXED LoDs and inserts/updates the values in the .tde file. This would still certainly slow down the Extract Refresh times.

                   

                  3. You still run into the problem of higher level filters (Like CONTEXT). Even if you did materialize the FIXED LoDs, Tableau would still need to re-calculate it when there is a CONTEXT Filter at play because a CONTEXT Filter can change the value at a record level. So it wouldn't be able to use the materialized values.

                   

                  I'm sure there are tons more potential issues with this, but this is just some that I can think of.

                   

                  Just my thoughts

                   

                  Rody

                  • 6. Re: Using calculations 'on the left' - LoD
                    Simon Runc

                    hi Rody,

                     

                    ...that's why they have smart guys, like Bora, to do the head-scratching on that!!

                     

                    Yes you raise some very good points there...especially the ones regarding 'pushing' the work to server in the TDE refresh/creation (as part of 'optimisation'). This has caught us out a few times! Especially when the 'view' we are generating is built from a 'Stored Procedure'. A recent example, is a measure we often get in data-sets is 'Wastage' (for those not immersed in the ‘exciting’ world of perishable retail!…’Waste’ is the value of stock thrown away, or reduced, due to it going out of life, or being damaged). This comes in as a positive value, but for some Vizes I want a negative...So create [Wastage (-ve)] as

                     

                    [Wastage]*-1   //this is pushed down in the optimisation to be generated by the query when the .tde refreshes

                     

                    ...all seems very straight forward...however the view we build to feed our Tableau model is built from a Stored Procedure (it does loads of other stuff!)...we started noticing the Refresh was taking a lot longer than expected...On investigation, Tableau had let the SP run, and then created a Temp-table to run our Wastage*-1 against...If we had just written this into the SP, we would barley have noticed a difference. This is how we work now, during POC phase I create all the calcs I need as part of the investigation/build, we then look at the Row Level ones and how they are being sent to SQL (in the build/refresh) and our DBA writes them into the view, so we no longer have them as calculated fields in Tableau.

                     

                    This is a small example for a very simple row level calc, so yes suddenly pushing complex LoD calcs down to the Database, during rebuild would need to be done very carefully and transparently! - You might even want a different syntax for those LoD calcs you want 'optimized' and those you want to keep 'calculated on the fly'. However that, in itself, is adding another level of complexity/explanation/understanding (a lot of people, just want to load their data and get Vizzing...and rightly so!)

                     

                    FIXED LoD like {MAX[Date]} have become so useful for any kind of YoY, QtD (and, to be fair, with no 'dimensionality expression' it's very fast even on large data)...but there have been some

                    eg.

                    INT(RIGHT(str({MIN(IIF([Year] = {MAX([Year])}-1,INT(str([Year])+str([Week No])),NULL))}),2))

                    //this picks up what is the first week of data from the previous year (so YoY calculations only looks at comparative weeks) - btw if anyone can think of a 'less ugly' version of this formula...don't be shy!


                    which, in Production Version, we'd get written into the view (as we do with 'MAX of Date')...but many don't have that luxury (and in PoC can be hard to know how fast/efficient the model will be,or if we are working Live with a client).

                     

                    In short, you would want the option to choose if an LoD was 'pushed' onto the server in optimisation...not withstanding your other, very good, points on how reality vs perception (on how these are calculated) can differ!! and yes totally agree on point 3, once a LoD is affected by any context filter 'materialization' would not be possible.

                    • 7. Re: Using calculations 'on the left' - LoD
                      Rody Zakovich

                      Simon Runc wrote:

                       

                      INT(RIGHT(str({MIN(IIF([Year] = {MAX([Year])}-1,INT(str([Year])+str([Week No])),NULL))}),2))

                      //this picks up what is the first week of data from the previous year (so YoY calculations only looks at comparative weeks) - btw if anyone can think of a 'less ugly' version of this formula...don't be shy!

                       

                      If you want to branch this to a new thread/question I'd be happy to weigh in on some alternatives

                       

                      Regards,

                      Rody

                      • 8. Re: Using calculations 'on the left' - LoD
                        Bora Beran

                        Hi Rody,

                        Yes. We try to consolidate queries when possible to improve performance. Though there was a point in time where we didn't have this  optimization and each LOD calc was a separate query.

                         

                        Thanks,

                         

                        Bora

                        1 of 1 people found this helpful
                        • 9. Re: Using calculations 'on the left' - LoD
                          Bora Beran

                          We need to teach TDE the concept of LOD for materialized calcs. That is precisely why we don't materialize now since TDE wouldn't know that they are aggregated to some level and without that it would skew the results of subsequent aggregations.

                          1 of 1 people found this helpful