12 Replies Latest reply on Nov 19, 2015 12:22 PM by Shawn Wallwork

# Calculate total % based on occurence of event

This is such a great forum and I've found so many great tips here! Unfortunately I'm stuck with the following problem:

I have two data sources: 1. yearly shipment amounts for a list of firms and 2. inspection dates for a list of firms. For each year, I want to calculate the total % of shipments from firms that have been inspected within the last three years. It's easy enough to calculate the total % of shipments that come from firms that were inspected in a given year, but I can't figure out how to lag or lead the date information in a way that lets me display the desired result. For example, for 2014, I want to know what % of total shipments came from firms that were inspected in either 2014, 2013, or 2012.

I've attached some example data here in two tabs of the same Excel workbook. I've read about lots of different date calculations and explored using a CROSS JOIN to link the data, but no luck so far.

• ###### 1. Re: Calculate total % based on occurence of event

Here's an example with some level of detail expressions. I think they can be simplified a bit but I don't have time to do that right now.

• ###### 2. Re: Calculate total % based on occurence of event

Join both sheets on [Firm ID]

Take [shipment] as: sum([Shipments])/countd([Insp ID])

Ratio will be: max(if [Year]<=year([Inspection Date])+2 and [Year]>=year([Inspection Date]) then 1 else 0 end)*[shipment]/window_sum([shipment])

compute using table down

• ###### 3. Re: Calculate total % based on occurence of event

Thanks for weighing in Dan! These calculations get me close, but not the whole way there . . .

An inner join on Firm ID will kick out (A) shipments from firms that were never inspected and (B) firms that were inspected but for which there are no shipments. (A) causes a problem with this specific calculation, but (B) has implications that affect other calculations not mentioned here.

Is there a way to blend or join the data that doesn't exclude the shipments from firms that were never inspected?

• ###### 4. Re: Calculate total % based on occurence of event

Hi Nasrin,

See my response to Dan above re: joining on Firm ID - the same challenge would affect your suggestion.

I also don't really understand the logic behind these formulas - why would I want to calculate an average of shipments per inspection?

• ###### 5. Re: Calculate total % based on occurence of event

Good catch. You could use a full outer join if your data is on a database that supports it (not Tableau's Excel connector). A left join will cover just (A).

My guess is you may want a separate connection to just the firms table for (B), but I could be wrong.

Nasrin is essentially using AVG as a way to remove the replication of the shipments caused by the cross-product with the inspections. In general I try to use MAX or MIN instead of AVG in level of detail expressions since Tableau will generate better queries, but in this case since you want the SUM of the shipments across the firms, it saves one level of detail. Pretty clever!

Dan

• ###### 6. Re: Calculate total % based on occurence of event

Hi Dan,

You wrote:

In general I try to use MAX or MIN instead of AVG in level of detail expressions since Tableau will generate better queries...

Can you explain more about what Tableau is doing here?

Thanks!

Jonathan

• ###### 7. Re: Calculate total % based on occurence of event

Jonathan -

The scenario is that you have a field that gets replicated due to a join, so you change the level of detail to where it is not replicated. For Tableau internally that says take the replicated data, then group it to a different level of detail. You want Tableau to realize that part of the join can be culled (removed) so it needs to notice that the aggregate can be computed without that level of detail being included. For MAX and MIN it is obvious that if all the values are the same, the MAX or MIN is just the value. For AVG it's obvious too, but sometimes we decompose AVG into SUM/COUNT and it's not obvious for either of those.

I hope that makes sense. The easiest thing is just to get in the habit of using MAX or MIN. And also don't mix & match - I've seen workbooks that used MAX one place and MIN the other on the same measure. We won't necessarily realize these are the same!

Dan

• ###### 8. Re: Calculate total % based on occurence of event

Excellent explanation Dan, it makes perfect sense. Makes me wonder if we don't need another function that is just a MAX() with a different name, maybe VALUE(). This would keep folks from getting confused with interchangeable MIN/MAX/AVG. Just a thought.

--Shawn

• ###### 9. Re: Calculate total % based on occurence of event

Shawn - We have ATTR() which has the right semantics for this scenario. But unfortunately we don't yet optimize that well enough for me to recommend using it. It gets broken into MAX() and MIN() and we don't recognize they are the same.

Dan

• ###### 10. Re: Calculate total % based on occurence of event

Thanks for the explanation, Dan, that makes sense to me as well. I was going to suggest ATTR() to Shawn as well, but your point is well taken in terms of optimization. That leads to another question, though - when we use an EXCLUDE LOD in a view Tableau's default is to wrap the LOD in ATTR(), is there a performance impact for that?

For record-level fields I've found that using MIN() or MAX() instead of ATTR() on a (not an LOD) can get up to 2x performance improvement. What I'll do when working with unfamiliar data is to use ATTR() until I'm confident of the results, and then go back to improve performance by switching the ATTR()'s to MIN()'s.

Jonathan

• ###### 11. Re: Calculate total % based on occurence of event

The answer to a performance question is always "it depends". You really need to test things to be sure. In general, ATTR() gets turned into a separate MIN() and MAX(). In 9.0 we would not have merged those into a single query, but we will usually merge them in 9.1. However, we still don't recognize they are running on a single row and must be equivalent so there is no need to compute both of them. In general, you should see much less perf impact in 9.1. There are further improvements coming in future versions.

The ATTR() that appears on an {EXCLUDE added to the worksheet I think does not get translated to a separate MIN() and MAX() so doesn't have a performance impact.

Dan

• ###### 12. Re: Calculate total % based on occurence of event

Yeah I suppose that is an option since all the rows are the same value. Never really thought of using ATTR() in this fashion.

--Shawn