5 Replies Latest reply on Aug 10, 2016 7:01 AM by Mark Fraser

    LOD Calulation built into SQL

    brian sams

      I am currently working and a Tableau project

        and have became stuck. I have only 14 million

        rows an a couple of fixed LOD calculations ( and yes it has to be string

        for one the dimensions in the fixed calculation)  The basic data set

        includes Customer, Period, Plant, Item, Demand, Forecast.  I am

        calculating an Item -plant level forecast accuracy so I had to use an Fixed LOD at

        the  ITEM-Plant Level or else it would return a customer accuracy forecast. I

        have seen where people said there are some work arounds and also

        refer back to writing the lod calculation in SQL,  I prefer writing it

        back in SQL, however I have not found the correct way to do it without

        replicating the Item-Level detail. Can anyone help?. Simon Runc

        • 1. Re: LOD Calulation built into SQL
          Tom W

          Hi Brian,

          Please provide a Tableau Packaged Workbook including sample data.

           

          Secondly, I'm not sure what the problem is? You cannot get a calculation working because the data type is string or something else? You also seem to ask about writing the LOD in SQL, but we have nothing to reference in terms of what the LOD you're writing is.

          • 2. Re: LOD Calulation built into SQL
            Mark Fraser

            Brian

             

            You have tagged in one of the smartest people I know but... there is a lot going on in your question.

            While it will make perfect sense to you, it maybe difficult for others to follow without at least some example.

             

            Is there a chance that you could provide some kind of sample, screenshot, anything to help unpick it?

            Could you even make the outline slighter simpler? (for those of us who aren't Simon )

             

            anything you can do in LOD you can certainly do in SQL, LoD is effectively a sub-select in SQL, put better here >> http://syntelli.com/blog/tableau-level-of-detail-expressions-introduction-and-include/

            If you could provide anything to help us, help you it would be appreciated

             

            Cheers

            Mark

            • 3. Re: LOD Calulation built into SQL
              brian sams

              I tagged SImon since we were already working on it. I have two LOD statements in my workbook { FIXED [Item Location],[Period]: SUM([Actual Demand  ])}   and   { FIXED [Item Location],[Period]: SUM([Forecast)}. These are used to compute a forecast accuracy at the item level. Since I am working with around 15 million rows filtering on any dimension tahkes over a minute. I wanted to see if anyone could figure out how to write the SQL for what tableau is doing. Essentially I was able to find out from Simon that that the subquery in the join statement would provide the same results (below). This is true however it produces replicates and since I am using a sum and not a MIN or MAX it does not work for me. IS there a way to write the query so it will not replicate on every line?

               

              From

              [dbo].[vDemandPlanTableauTest]

               

               

              INNER JOIN

               

               

                (SELECT [vDemandPlanTableauTest].[Period] AS [Period],

                [vDemandPlanTableauTest].[Level1] as [Item-Plant],

                Sum([vDemandPlanTableauTest].[ActualDemand]) AS [Actual Demand LOD],

                sum([vDemandPlanTableauTest].[Forecast]) as [Forecast LOD]

                FROM [vDemandPlanTableauTest] GROUP BY [Level1],

                [Period]) LODTABLE

               

               

                ON [vDemandPlanTableauTest].[Level1] = [LODTABLE].[Item-Plant]

                AND [vDemandPlanTableauTest].[Period] = [LODTABLE].[Period]

              • 4. Re: LOD Calulation built into SQL
                Simon Runc

                Thanks Tom/Mark (...and Mark you are too kind, or don't get out enough).

                 

                Brian actually emailed me (directly) on this problem (so on the following response I have extra knoweldge from our conversation, and not super-human powers of deduction/mind-reading), and I asked him to post it up here (once we'd worked through it) as I thought it was an interesting problem, and brings up some interesting topics, which I thought others might benefit from. I've also mentioned (as Brian points out) writing LoDs back into the SQL view a few times (and never really expanded on what I meant, and wanted to highlight some pitfalls). caveat that this might read more like a blog!

                 

                So although I've (anecdotally) noticed a big improvement in the speed of FIXED LoDs since 9.3, there is still a performance cost to using them. I my day-to-day job I use LoDs (a lot) in the PoC work, but try and get any that I can* written back into the data-feed (typically in SQL) for the final production version to improve performance (both for the end user, and our 'overworked' Tableau server)

                *when I say "That can"...I mean FIXED LoDs that aren't dependent on a parameter selection, or a context (or higher) level filter...as the LoD evaluation can change in these situations (as the end user can alter either of these) and so need the LoD to be re-evaluated on the fly

                 

                I've put together an example using superstore, and the creation of a "First Order Date" and "Total Spend" by customer. Written as FIXED LoDs these are, as below (as the grain of the data is OrderNumber/Item, if I want to do some customer segmentation, and create 'real' (row level) dimensions I need to use LoDs);

                 

                [Original Order Date LoD]

                {FIXED [Customer Name]: MIN([Order Date])}

                 

                and

                [Total Customer Spend LoD]

                {FIXED [Customer Name]: SUM([Sales])}

                 

                However I can (using CustomSQL) write these calculations back into the original view, as materialised columns (which would make the performance much better). So first I used the Legacy Connector for Excel (to bring up the Custom SQL option...for SQL DBs this is always just there)

                 

                I then enter in the following to the Custom SQL window

                 

                SELECT [Orders$].[Row ID] AS [Row ID],

                [Orders$].[Order Priority] AS [Order Priority],

                [Orders$].[Discount] AS [Discount],

                [Orders$].[Unit Price] AS [Unit Price],

                [Orders$].[Shipping Cost] AS [Shipping Cost],

                [Orders$].[Customer ID] AS [Customer ID],

                [Orders$].[Customer Name] AS [Customer Name],

                [Orders$].[Ship Mode] AS [Ship Mode],

                [Orders$].[Customer Segment] AS [Customer Segment],

                [Orders$].[Product Category] AS [Product Category],

                [Orders$].[Product Sub-Category] AS [Product Sub-Category],

                [Orders$].[Product Container] AS [Product Container],

                [Orders$].[Product Name] AS [Product Name],

                [Orders$].[Product Base Margin] AS [Product Base Margin],

                [Orders$].[Region] AS [Region],

                [Orders$].[State or Province] AS [State or Province],

                [Orders$].[City] AS [City],

                [Orders$].[Postal Code] AS [Postal Code],

                [Orders$].[Order Date] AS [Order Date],

                [Orders$].[Ship Date] AS [Ship Date],

                [Orders$].[Profit] AS [Profit],

                [Orders$].[Quantity ordered new] AS [Quantity ordered new],

                [Orders$].[Sales] AS [Sales],

                [Orders$].[Order ID] AS [Order ID],

                [Customer Level Table].[Original Order Date SQL]

                [Customer Level Table].[Customer Spend]

                FROM [Orders$]

                 

                INNER JOIN

                 

                (SELECT [Orders$].[Customer Name] AS [Customer Name],

                MIN([Orders$].[Order Date]) AS [Original Order Date SQL],

                SUM([Orders$].[Sales]) AS [Customer Spend]

                FROM [Orders$] GROUP BY [Customer Name])

                [Customer Level Table]

                 

                ON [Orders$].[Customer Name] = [Customer Level Table].[Customer Name]

                 

                So the [Customer Level Table] is a sub-query (very similar to what Tableau does for LoDs) creating aggregated fields at the [Customer Name] level, and then Joining back to original table at this level ([Customer Name]

                 

                The 'SQL and LOD - appear to be doing the same' tab in the attached shows both the LoD and SQL versions, appear to be doing the same thing

                 

                However...if I remove Order Date from the vizLoD... (shown in 'SQL and LOD - SQL wrong' tab)...the SQL version of my Total Customer Spend is now wrong

                This is because, unlike LoDs the SQL version is putting that Total Customer Spend value against every row, for each [Customer Name]...and so when SUMed it inflates the value (Multiplied by the number of rows for each customer). The [Original Order Date] is fine. This is because this is created as a Non-Additive aggregation (MIN), and the other is on an additive aggregation SUM. So as you can see in the final column I've changed the aggregation to MIN and all is fine again (although Grand Totals won't work properly).

                 

                So although this technique is very useful, it needs to be used carefully. The SQL generated aggregated calculations are really row level dimensions, so I can do this like...

                 

                IF [Total Customer Spend SQL] < 1000 THEN 'Bronze'

                ELSEIF [Total Customer Spend SQL] < 2000 THEN 'Silver'

                ELSE 'Gold'

                END

                 

                This was an issue Brian faced when implementing this...and came up with a pragmatic (lets call it) solution of dividing the [Demand] returned at Plant/Period level, by the number of periods (COUNTD) so that it was correct at the levels he had in his Viz. The below shows an example (using Total Sales) of what he did

                 

                 

                So there are ways round this (this is quite an imaginative solution, but is very specific so wouldn't advise this generally!!...very clever though!)

                 

                Although it wouldn't help Brian, as he's really after a performance boost (14 Million+ rows and Plant and Period are both strings!). I also wanted to show a further solution using blending. My blog on this subject gives more detail on why this is good solution The Importance of Granularity (…to Blend or not to Blend?) | The Data Animators

                In short, we have 2 grains of data (the table at OrderNumber/Item level, and our LoD calculations at Customer Level), which is what blending is made for.

                 

                So in a second version I created 2 Datasources. The first one is just the Orders Table (pull directly in from the Excel, so no Custom SQL) and the second one is just the Sub query part (using the custom SQL)

                 

                SELECT [Orders$].[Customer Name] AS [Customer Name],

                MIN([Orders$].[Order Date]) AS [Original Order Date SQL],

                SUM([Orders$].[Sales]) AS [Customer Spend]

                FROM [Orders$] GROUP BY [Customer Name]

                 

                They are then blended on Customer name, and the below shows that this works just like our LoD

                and if I remove Order Date, it's still all good!!

                 

                not surprising as Blending like this, is how we used to do it pre-LoD!

                 

                Hope people find this useful.

                1 of 1 people found this helpful
                • 5. Re: LOD Calulation built into SQL
                  Mark Fraser

                  brian sams wrote:

                   

                  I tagged SImon since we were already working on it.

                  Brian, my apologies... I didn't know!

                  Recently the 'ping' functionality has been a little abused (by some other forum members) and I mixed you up with that...

                  Hope you got what you need...if you're working with Simon, you're in very good hands

                   

                  Cheers

                  Mark