14 Replies Latest reply on Feb 21, 2017 4:57 AM by Simon Runc

# LOD expression for 'difference from overall average'

Hello folks,

I'm having trouble understanding this expression that compares sales from one year to all other years.

(solution 03 in the attached workbook):

AVG(Sales) - ATTR({ EXCLUDE [Order Date] : AVG(IF YEAR([Order Date]) = [Choose Year] THEN [Sales] END) })

What is the purpose of the ATTR here?

And I know EXCLUDE has the purpose of excluding the current year for comparison if that is the one being selected by the user in the parameter control, but how exactly is it being performed (at a database level, maybe)?

Thank you so much,

Esther

• ###### 1. Re: LOD expression for 'difference from overall average'

hi Esther,

So I'm going to assume that there is a good reason why the formula needs to be like this and so will only (attempt) to describe what each element is doing.

So the first part is easy...that is just the Average of Sales, where the average is whatever the Level of Detail of the Viz (in your Example Years)

For the next part let's tackle it inside out!...so first the

IF YEAR([Order Date]) = [Choose Year] THEN [Sales] END

so this is doing a Row Level calculation where only the [Sales], where Year = [Chosen Year] is returned...and as there is no ELSE, the returned value for any other year is NULL (this not zero, as that would affect the average). Like the below example

That is then wrapped in an AVG, so we get the Average of that column...where NULLs are ignored.

So as we have Year in the VizLoD (viz Level of Detail)...we have this

As the value for 2011, 2012, 2013 are NULL (if you look at my Excel example 2010 and 2012 AVG is NULL. So the Exclude, means that the Year is excluded, when the calculation is run...so with the EXCLUDE it now returns

The final bit is the ATTR...this is there as you can't mix aggregate calculations and non-aggregate calculations, and as the first term is an aggregate (AVG) the second part also needs to be. As (with the current VizLoD) the value is the same for every year, so we can use MIN, MAX, AVG, or ATTR (which is the aggregation for returning a single value...btw this is the aggregation that returns the dreaded * when there are more than one value and you've asked it to ATTR).

So that's about it. For this kind of thing it's often useful to break down the parts, put them on the canvas (btw you can just grab and drop elements from a calculated field...gif below), which helps you see what each part is doing

Hope that helps and makes sense.

6 of 6 people found this helpful
• ###### 2. Re: LOD expression for 'difference from overall average'

What is the purpose of the ATTR here?

Its a generic aggregation here.  Tableau uses it here since Exclude calculations are calculating at a higher level than the view -- it could just as easily be min/max, but Tableau defaults to ATTR().  ATTR() itself is a check for MIN() and MAX() -- if they are equal, the value is displayed, if not, a NULL will result.

And I know EXCLUDE has the purpose of excluding the current year for comparison if that is the one being selected by the user in the parameter control, but how exactly is it being performed (at a database level, maybe)?

Yes, LODs are computed at the database level rather than locally, with all kinds of fun cross joins and other stuff happening behind the scenes when you drag and drop.  This is a pretty comprehensive resource on how they work:  What’s new in Tableau 9.0? Part 2 – Level of Detail Expressions | Bora Beran

4 of 4 people found this helpful
• ###### 3. Re: LOD expression for 'difference from overall average'

Thank you so much for the thorough explanation. I particularly liked how you broke down the LOD expression as this is one aspect I haven't fully comprehend about Tableau.

And that bit about dragging parts of a calculated field to the view is a MAJOR tip! Thank you!!

The only thing I'm not yet grasping it fully is the EXCLUDE use. So, we only have Order Date in the view and we are excluding that same dimension, which would left us with a "blank" dimensionality, no? Why does the replication of the date selected in the "Choose Year" parameter occur?

Much appreciated!

Esther

• ###### 4. Re: LOD expression for 'difference from overall average'

Although I have read and reread that particularly insightful article about LOD expressions, I'm still not being able to fully grasp it... Maybe I have yet so much practice to get done to get some kind of intuition about how LODs really work.

Esther

• ###### 5. Re: LOD expression for 'difference from overall average'

...so on the Exclude (which, conceptually can be hard to grasp)...it's not that it's Excluding the current year, as that it is excluding the Order Date (at whatever level we have in the Viz, Year in this example) from the calculation...Tableau is all about the VizLoD.

So if I just bring SUM([sales]) into the canvas...I get

If I now bring the Year into the VizLoD, I get this...

So the VizLoD has affected how this Aggregated (SUM) calculation is calculated.

So if we only look at the AVG(IF YEAR([Order Date]) = [Choose Year] THEN [Sales] END) part (notice it's wrapped in an AVG Aggregation), and I have no other Dims in the VizLoD I get...

but if I now bring year in...this happens (I've also brought in AVG([Sales]) so you can see what's happening

So we use the EXCLUDE to say, EXCLUDE the Order Date Level of Detail (Year in this case) from the Calculation. So it's like we don't have Year in the VizLoD. If we did the AVG([Sales]) - AVG(IF YEAR([Order Date]) = [Choose Year] THEN [Sales] END) we'd get this.

So 2011, 2012, 2013 can't do the subtraction as the AVG(IF YEAR([Order Date]) = [Choose Year] THEN [Sales] END) is NULL for those years, but by excluding Order Date (Year in our case) from the clauclation all years have the same value (the AVG of the 2010 Sales) in order to perform the calculation.

Hope that helps...In short EXCLUDE is Excluding the Order Date in the Level of Detail, and NOT excluding any particular year. The formula IF YEAR([Order Date]) = [Choose Year] THEN [Sales] END is only populated for 2010 data anyway (so we've hard coded the year we want to compare into the formula)

You might also find this useful...I (attempt!) to show what the different calculation types act in Tableau (and especially looking at how the VizLoD affects things)

and Thanks Matt Lutton for adding the links (and clearer explanation!)...good to hear from you, as always.

2 of 2 people found this helpful
• ###### 6. Re: LOD expression for 'difference from overall average'

Thank you for the screenshots, Simon Runc!

I've replicated them in my worksheet to better see the differences

Now the one thing that is still not hitting me is this declaration:

So 2011, 2012, 2013 can't do the subtraction as the AVG(IF YEAR([Order Date]) = [Choose Year] THEN [Sales] END) is NULL for those years, but by excluding Order Date (Year in our case) from the calculation all years have the same value (the AVG of the 2010 Sales) in order to perform the calculation.

How does Tableau know that it has to replicate the chosen year for all other years ? I mean... How is the calculation matching the level of detail for year=2010?

I'm sure you are explaining it well, the problem relies in me... I promise this is the last explaining effort I ask from you.

Thank you a thousand times.

Esther

• ###### 7. Re: LOD expression for 'difference from overall average'

Maybe "Can't do the calculation" was unhelpful. What I meant is that it can't do the right calculation.

The 2010 is hard-coded into the formula (OK we've used a parameter here, but imagine it says IF YEAR([Order Date] = 2010 THEN [Sales] END...

So this means that only the Rows that are of year 2010 have any Sales values in, and the others years have nothing in (see the image below)

However we want the AVG of this column applied to every row...so we need to tell Tableau to exclude the Year. So in my Excel, the 124 number is now applied to every year, so it can now be taken away from the Sales for each year.

Also as only year is excluded...if you add another dimension in (say Category) it will get included...so try that and see if it makes sense as to what is happening!...then add Category to the EXCLUDE terms and see what that does!

In Short....Tableau works off Set-Theory, like a database, and not Cell-based like Excel, so how we think about calculations is very different (if that means nothing, let me know and I can go into a bit more detail).

2 of 2 people found this helpful
• ###### 8. Re: LOD expression for 'difference from overall average'

Simon Runc wrote:

In Short....Tableau works off Set-Theory, like a database, and not Cell-based like Excel, so how we think about calculations is very different (if that means nothing, let me know and I can go into a bit more detail).

I think I'll appreciate that, but only if it doesn't take much of your time.

Because what I'm really struggling to understand is: if EXCLUDE makes Tableau ignore the date, then how does he know he has to replicate the date he is ignoring! How can he 'see' it?

I must be lacking some neuron flexibility to understand this Or I just haven't quite figured out the way Tableau works...

Thank you so much for your patience, Simon Runc Once again.

• ###### 9. Re: LOD expression for 'difference from overall average'

No problem...this is good for me! I do a bit of training (only a bit as is becoming increasingly clear!!) and come across people with very different background (SQL, Excel, other BI...) so adding to ways I can explain things helps me improve!

So i'm going to try a different tactic!...EXCLUDE and INCLUDE are, conceptually, one of the most complicated aspects in Tableau (and I think the confusion of the word EXCLUDE and the fact we are EXCLUDING years doesn't help). The EXCLUDE in an LoD is referring to the dimension, not an actual year. FIXED LoD (which are cousins for INCLUDE/EXCLUDE) are far easier to understand, and from there should help in understanding EXCLUDE/INCLUDE LoDs....

Lets start with why Tableau introduced LoDs, and what problem they solve...

Imagine I have this...

and I want to colour the Categories as Good or Bad depending on if they are > or < 800K.

I can create the following formula...

IF SUM([Sales]) > 800000 THEN 'Good' ELSE 'Bad' END

and I get this...

However, if I now want to see how customers are purchasing into either good or bad categories (say i want to see the COUNTD of customers who have bought into Good or Bad Categories)...I need to remove Category from the Row Shelf...but this happens

why? The SUM([Sales]) is now calculated over a different Level of Detail (all the data), and in this Viz the value (\$2.3M) is greater than 800,000. Tableau doesn't know that I want this to be calculated at Category level (how would it?!!)....also notice I'm unable to add my Good/Bad field to the dimension. An an aggregate calculation the returned value is dependent on the Viz LoD...drag in SubCat, and they are all Bad, as no SubCat SUM([Sales]) > 800000

So in order to use the Good/Bad dimension as I wish, I always need Category (and only Category) in the VizLoD. This is what i refer to (in the Quora post) as "On-Canvas" calculation, as the result of the calculation is dependent on what is in the Viz. Before Tableau 9.0, we'd have to use Table Calcs and all kinds of tricks to do this kind of thing

So Tableau created "Off-Canvas" LoDs, where we can specify (in the formula) the Level of Detail to run the calculation at.

So if I now change my field to...

IF {FIXED [Category]: SUM([Sales])} > 800000 THEN 'Good' ELSE 'Bad' END

With Category in the VizLoD...it's the same

but when I remove Category...it doesn't change how it's calculated (I can now assess Good vs Bad without needing Category in the VizLoD!)

So this calculation is what I refer to as "Off Canvas"...and I can add this field to the dimension pane (as the results, Good or Bad, are returned to every row, at the Category (as this is what level we specified) level...so all the rows with Technology as the Category will always be "Good" (like a real dimension)

So LoDs are all about defining how a calculation should be run, in the context of the VizLoD

INCLUDE/EXCLUDE are a halfway house...where the formula will run as an AGGREGATE (on canvas) but also either INCLUDE or EXCLUDE any specified dimensions...

So let's look at your problem as a FIXED...

I've created the following calculation for 2012 sales

SUM(IF YEAR([Order Date]) = 2012 THEN [Sales] END)

So like before, the Sales for 2012 column only contains the sales for 2012...but as an aggregate these are only on the rows which are of the year 2012...so I want to tell Tableau that I want to return the Sales for 2012, but for every row (ie. every date)...so I wrap it in a FIXED (with no dimension)

{FIXED: SUM(IF YEAR([Order Date]) = 2012 THEN [Sales] END)}

So I've said take the SUM of all the rows for Sales 2012, and then return this value to every row in the data (as I've not specified a dimension). I can now do the SUM([Sales]) - SUM([Sales for 2012 - FIXED]) where the SUM of Sales (as an Aggregate, and so on canvas) takes advantage of the fact that having year in the VizLoD means wet get a value per year, and the FIXED (which is off-canvas) returns the same value regardless of the VizLoD

If I remove Year from the VizLoD...the on-canvas calculation changes, but the off-canvas one doesn't so I get

So hopefully that makes sense?

Once you have that concept, the EXCLUDE is just another way of doing the same, but rather than FIX the calculation Level of Detail, you let it use the VizLoD but you tell it what VizLoD to exclude (or include for INCLUDE)...

When we are EXCLUDING [Order Date], although the data is by day, that entire dimension is EXCLUDED (so doesn't matter if it's there by Day, Week, Year...

So back to my first example...say I want to have SubCategory and Category in my VizLoD, but want to colour the Good/Bad based on the Category total...I'd use the formula

IF {EXCLUDE [Sub-Category]: SUM([Sales])} > 800000 THEN 'Good' ELSE 'Bad' END

So I'm saying...run the SUM of Sales as per the VizLoD (Category/SubCategory) but Exclude the SubCategory in the aggregation of that calculation

...Hope that helps...and if so I'll tackle the Cell vs Set-theory mental models!!

3 of 3 people found this helpful
• ###### 10. Re: LOD expression for 'difference from overall average'

Hello Simon Runc,

Thank you -so much- for the close examination of this LoD's matter. I actually have some SQL (academic) background and in terms of BI I've only explored a data mart using Excel (using dimensional modeling), but strange as it may sound I'm finding it very hard to understand some concepts of Tableau.

I'm almost, almost there!

A few questions:

1. According to this article, having {FIXED : aggregate expression} isn't supposed to be the same that just having the aggregate expression? (Bora Beran says "specifying no dimensions using fixed keyword e.g. {fixed : MIN([Order Date])} gives a single result for the entire table. Shorthand for {fixed : MIN([Order Date])} is {MIN([Order Date])}. "

2. He also says "With include keyword calculation will always either have same or finer level of detail than the sheet which requires aggregation to match sheet’s level of detail. For fixed it can be coarser or finer depending on the calculation and dimensions in the sheet." But in this case, the dimensionality is missing ({FIXED: SUM(IF YEAR([Order Date]) = 2012 THEN [Sales] END)} and the value of 2012 still got replicated along the table. Shouldn't the replication occur only when the calculation is of coarser level of detail than the viz?

I think I'm almost getting it... I'll just have to reread your explanation a few more times to see if I can wrap my head around it!

Have a good day.

Esther

• ###### 11. Re: LOD expression for 'difference from overall average'

Glad it's starting to help...so SQL knowledge...in which case for every-time I refer to LoD, think GroupBy!

I think the confusion is coming from the SUM(IF YEAR([Order Date]) = 2012 THEN [Sales] END) so this does have dimensionality...it's just that it returns NULL for all other years than 2012, but it only equals 484,247 for the Year = 2012.

SELECT SUM(IF YEAR([Order Date]) = 2012 THEN [Sales] END) as 'Sales for 2012'

GROUP BY YEAR([Order Date])

returns

YEARSales for 2012
2012484,247
2013NULL
2014NULL
2015NULL

SELECT SUM([Sales]) as 'Sales'

GROUP BY YEAR([Order Date])

Returns

YEARSales
2012484,247
2013470,533
2014608,474
2015733,947

So we need to remove the Year dimension from the Sales for 2012 (using the EXCLUDE...think of this as an EXCLUDE from Group By, where everything you add to the Tableau canvas, in terms of dimensions, adds that to the GROUP_BY

YEARSalesSales 2012Sales 2012 EXCLUDE YEAR from GROUP BY (LoD)
2012484,247484,247484,247
2013470,533NULL484,247
2014608,474NULL484,247
2015733,947NULL484,247

So Sales is on the GROUP_BY on Year as it's just a regular Aggregate, so we need to remove the YEAR dimensionality from the Sales 2012 figure so the subtraction can be performed (As we want) against each year.

Tableau is very much like a SQL engine...except the results are returned in Render Instructions rather than a table.

Any Dimension you add to the canvas (Row, Column, Colour, Shape...) creates a GROUP_BY (as an aside....Filters are WHERE clauses)...and so FIXED, INCLUDE, EXCLUDE are Sub-Queries allowing us to run calculations at a different level (grain) to the canvas.

• ###### 12. Re: LOD expression for 'difference from overall average'

...or this Excel might help. I've created a pivot table and tried to do the subtraction on both the non-Exclude and Exclude version.

1 of 1 people found this helpful
• ###### 13. Re: LOD expression for 'difference from overall average'

Well well, it actually helped thinking in terms of SQL

Thank you a lot, Simon Runc!

I hope I can count on more verbose explanations like yours.

Esther

• ###### 14. Re: LOD expression for 'difference from overall average'

Excellent...definitely deserves a

yes, as Tableau thinks like SQL, thinking of it in these terms is the easiest

FIXED means FIX the GROUP_BY (to the ones defined in the calculation) - regardless of the GROUP_BY that get added, by adding to the Canvas Level of Detail

EXCLUDE means, use all the GROUP_BY that are added, by adding to the Canvas Level of Detail, but EXCLUDE the ones defined in the calculation

INCLUDE means, use all the GROUP_BY that are added, by adding to the Canvas Level of Detail, but also INCLUDE the ones defined in the calculation (even if they are not on the Canvas)

Funny how all the above can be summarised to the above 3 lines!!...I'm adding these 3 lines to my Tableau Training Deck, so thanks for challenging me...explaining things to someone else is the best way (I find) to clarify my own thoughts/understanding!