1 2 Previous Next 18 Replies Latest reply on Aug 17, 2013 7:10 AM by Shawn Wallwork

# Calculate first occurrence only?

I have a large repository that stores a list of jobs and category value.   Each job+category value can be recorded more than once on any given day.   I want to know how I can show just a daily, weekly, or monthly total for the first occurrence of any given job+category value.

I this possible?

Example:   The highlighted items below are the only ones I want to show.  They are the first occurrences for the combo of Job+Category.

Excel sample for this:

• ###### 1. Re: Calculate first occurrence only?

Hi Mark,

You should be able to do this is with a table calculation, can you re-post your workbook making sure it's a twbx file? (Sending a Packaged Workbook | Tableau Software )

-Tracy

1 of 1 people found this helpful
• ###### 2. Re: Calculate first occurrence only?

Thank you Tracy.  I have added the correct workbook.

• ###### 3. Re: Re: Calculate first occurrence only?

Mark this will get you started. It's not exactly the results you're looking for because Tableau isn't evaluating first the same way you are asking it to. But you can probably tweak the "compute using" of the table calc to get what you want. I just haven't had enough Java to see the first logic you're trying to apply. The basic calc is:

IF FIRST()=0 THEN SUM( [Number of Records] ) END

Cheers,

--Shawn

1 of 1 people found this helpful
• ###### 4. Re: Calculate first occurrence only?

Having only a few weeks experience with Tableau your statement "tweak the "compute using" of the table calc" and having to use Java scares me.   Anyone have any other hints they can toss my way?

• ###### 5. Re: Calculate first occurrence only?

haha that makes sense!

• ###### 6. Re: Calculate first occurrence only?

Thanks for you sample Shawn.  Using your workbook as-is I noticed it was not showing correct values.

I highlighted the cells that should show, missing, and don't show from the default view.  Before I can understand which "compute using" choice I should use I need to know what exactly your Measure is doing?

• ###### 7. Re: Calculate first occurrence only?

HA! Thx Johan for the assist. I was so coffee-deficient I didn't make the connection. I'm just not the table calc guy (though I'm working on it, and have quit being table calc adverse) Joe Mako Jonathan Drummey or even richard.leeke.0 (if he's awake) are much better at bending these magical tools to their will.

If none of them (or others I haven't called out because the list is long) answer by tomorrow then I'll give it another go.

Cheers,

--Shawn

• ###### 8. Re: Re: Calculate first occurrence only?

I know Joe is working on a solution, I'd already come up with an alternative route. The attached uses a calculated field to test whether the current date is the earliest for the Job & CV Code, the formula is MIN(DATETRUNC('month',[OPEN_TIME])) == TOTAL(MIN(DATETRUNC('month',[OPEN_TIME]))). This has an advanced Compute Using on OPEN_TIME.

One thing that happens is unwanted padding/densification. When we set the Compute Using of a table calc to include a date, for certain pill arrangements (particularly where the date field(s) are on Rows and Columns) Tableau will automatically start padding out the data, which can lead to unexpected (and incorrect) results for table calculations that depend on a certain ordering/row count, like SIZE() and FIRST(). You can see this in the attached, the "see 19 marks" tab shows the original data. The "workout" tab shows the view with the calc, now there are 48 marks as Tableau pads out the Job & CV Code for every combination of Year & Month of OPEN_TIME.

The formula for the Flag calculation handles the unwanted padding by looking at the values of the data instead, and by leaving Nulls (which are returned for the padded data) out in the filter on the Flag calculation, we can remove those extra rows/columns from the display.

The alternative method that Joe is writing up uses the pill arrangement and some custom fields to avoid Tableau padding in the first place, so the FIRST() calc can return correct results. In practice, I'd probably use Joe's solution, since it prevent unnecessary computations.

Jonathan

• ###### 9. Re: Re: Re: Calculate first occurrence only?

Shawn, I think you got very close to a solution, if you switched the lists in your Advanced compute using dialog (partitioning on both CV Code & Job, and addressing on Year & Month), I think it would be reasonable to expect that Tableau should provide the result desired. Sadly that is not the case.

---- Begin Background Details:

We have to deal with data densification. Data densification is when Tableau adds additional marks to complete a domain of possible dimension values or combinations.

Because there is no option to directly turn off data densification, it is not an option we can directly control like stacking marks, we can use awareness of all the factors and exceptions (too many for me to list in this comment, and some do not apply to this situation) that cause data densification, to create the desired view, using what I consider to be workarounds.

I have been petitioning Tableau for years to give us an option to turn off data densification, and I plan to use this use case, because it makes a great example.

---- End Background Details

One of a few routes to get the result requested is in the attached workbook.

I did the following:

1. created calc fields for "Month" and "Year" like:

DATENAME('month',[OPEN_TIME])

DATENAME('year',[OPEN_TIME])

2. created a table calc to test for first record in partition, "Is First":

FIRST()==0

3. placed "Year" and "Month" pills on the Columns shelf as blue Attribute pills (Discrete Measures so they do not impact partitioning, addressing or densification)

4. placed "CV CODE" and "JOB" as Discrete Dimension on their respective shelves (these will partition the table calc)

5. placed "OPEN_TIME" on the Detail landmark, so it is on the Marks card as a DateTrunc/Value Month (in the pills context menu, we see "Month May 2011") (this will make it a dimension for use in addressing, and it will not cause densification because it is on just the Marks card, it can be either discrete or continuous)

6. placed "Is First" on the Color landmark (could be on any shelf, but the color let me easily see what is happening in this case), on drop we see the effect of densification, we go from 19 to 41 marks.

7. to turn off the densification, and get the desired computation, set the Compute using for the Is First pill to "OPEN_TIME" (it should now be colored correctly)

8. move the Is First pill from the Marks card to the Filter shelf (to retain config), and set it to keep only when True.

9. finally, place SUM(Number of Records) on the Label/Text shelf

Please let me know if you have any questions. If you would like to dive deep and explore the factors and exceptions for data densification that currently know of (there are likely some that I do not yet know of), you or anyone are welcome to contact me, my email is my profile.

• ###### 10. Re: Re: Re: Re: Calculate first occurrence only?

Joe thanks for this. Here's what I don't understand.....

And of course we've all encountered the dreaded *. When I began recreating your viz from scratch I got to this point:

And then to this point:

It isn't until you drop the Open_Time pill on the LOD that all the headers get sorted out. What is going on through this progression? For instance why does T "know" to separate June and July, but not May, June? I'm trying to understand the way Tableau "thinks". Thanks,

--Shawn

• ###### 11. Re: Re: Re: Calculate first occurrence only?

Densification is not easy to track and predict, there are multiple ways to turn it off or on, or we can filter the densified marks away to make it seem like it is off. In my other comment in this thread, I turned off densification, while Jonathan left densification on and filtered out the densified marks.

Either way clearly works, the point where filtering out densified marks is less ideal than turning off data densification is when the data densification creates 100s of thousands or marks.

The downside of turning off densification is that it requires awareness of the factors that turn it off and on, so without this awareness, it can be easy to accidentally turn it back on by changing the pills or their arrangement.

Here are two common situations applicable to this thread (there are others) when Data Densification happens:

- A Discrete Dimension date data type pill is on the Rows OR Columns shelf, and is addressing a table calculation (with no other fields addressing as well)

- Discrete Dimension pills are on both the Rows AND Columns shelf, and at least one of them is addressing a table calculation

Here is my go-to route for turning Data Densification off in these situations:

- Move the Discrete Dimension pills that are used for addressing to the Marks card, and use Discrete Measure pills in their place on the Rows/Columns shelves if you still need them to make cells and panes.

• ###### 12. Re: Re: Re: Calculate first occurrence only?

I think Tableau documentation leaves a lot of room for improvement and could do better describing what ATTR() is and what it does.

I think of ATTR(<field>) as shorthand for:

IF MIN(<field>) = MAX(<field>) THEN MIN(<field>) ELSE "*" END

ATTR() is performing an aggregation, like SUM(), MIN(), MAX(), they all roll up, or aggregate data to a level of detail. This level of detail is set by the combination of Dimensions pills. Dimension pills on the Rows/Columns/Marks/Pages all impact the level of detail. Dimension pills on the Filter do not impact the level of detail. Each Marks card can have a different level of detail when it has different Dimension pills.

In your first image, CV Code is the only dimension pill, and if we look at the MIN and MAX value of Month for each CV Code, we can see that they are different, therefore the ATTR of Month will return "*"

In your second image, CV Code and Job are both Dimension pills. MIN and MAX of Month are the same for some combinations of CV Code-Job so that Month is returned and displayed for those, and "*" for combinations where there are multiple Months for the combination of CV Code-Job.

When we place MONTH(Open Time) on the Marks card as a dimension, the level of detail becomes:

the combination of MONTH(Open Time) and CV Code and Job

at this level of detail, the ATTR() of Month will not be an "*" because the MAX of Month = the MIN of Month for each mark

This gets to my core of my perspective on Tableau: Awareness of Dimensions and Measures.

I believe with awareness of Dimensions and Measures nearly everything in Tableau can derived and understood. Without this awareness, Tableau can be a confusing and frustrating application because it will seem to behave in unexpected ways, producing results that sometimes seem to be right, but in other times, like when the data changes, return incorrect results.

• ###### 13. Re: Re: Re: Calculate first occurrence only?

Thanks Joe, nicely explained. I especially like this bit of thinking:

I think of ATTR(<field>) as shorthand for:

IF MIN(<field>) = MAX(<field>) THEN MIN(<field>) ELSE "*" END

Makes sense. Also:

I believe with awareness of Dimensions and Measures nearly everything in Tableau can be derived and understood. Without this awareness, Tableau can be a confusing and frustrating application because it will seem to behave in unexpected ways, producing results that sometimes seem to be right, but in other times, like when the data changes, return incorrect results.

And it's threads like this that will help us all raise our awareness. Thanks again.

--Shawn

• ###### 14. Re: Re: Re: Re: Calculate first occurrence only?

Hi Shawn,

I thought I'd take a swing at your question to try to practice explaining this. I don't see this view as Tableau "knows" to separate May and June, it's about the data, dimensions, and measures, and I'll try to walk through how Tableau does this. In this particular view, there are headers generated by dimensions (CV Code and Job) and headers generated by measures (ATTR(Year) and ATTR(Month)). The values of those measures are determined by the dimensions in the view.

This is the critical bit to understand... To put this another way, even though we're seeing Month and Year, they don't exist as independent values anymore because they aren't dimensions anymore, we've wrapped them in an aggregation and turned them into measures. (These are not the droidsMonth and Year you are looking for). The values of ATTR(Year) and ATTR(Month) are dependent on the level of detail created by the dimensions in the view.

This is roughly the computation that Tableau does for ATTR:

IF ISNULL(foo) THEN

Null

ELSEIF MIN(foo) == MAX(foo) THEN

foo

ELSE

*

END

This alternative layout can help explain what ATTR(Month is doing). I put the dimensions on the left, and the aggregations on the right, along with a COUNTD(Month) to show how many months were being returned. Anytime there is more than one value of Month, ATTR(Month) is going to return an *:

What's happening is that in those first views, Tableau is aggregating (using ATTR) Year and Month for the distinct values of CV Code and Job (the dimensions). There's only one year being returned, 2013, so that is returned in the ATTR(Year). There's only one value of month for CV Code CV02 and Job J100, so that returns a value for July for ATTR(Month). There's only one value of Month for CV Code CV02 and Job J103, so that returns June for ATTR(Month).  For CV Code CV00 and Job J100, there are multiple months (June and July), so ATTR(Month) returns * for that (and every other combination of CV Code and Job that has multiple months).

When you put the datetrunc version of MONTH(OPEN_TIME) in the view, that's adding the dimension to the view. With that, now there's a fine enough granularity that there's only one month per CV Code/Job/MONTH(OPEN_TIME), and ATTR(Month) always returns the month and not *:

The "trick" here is to look at the pill and understand whether it's a dimension or measure, and then to have enough knowledge of the underlying data to know its granularity and relationships to be able to decipher what Tableau has done. So, for example, in a one to many hierarchy such as the Product hierarchy in Superstore Sales we can bring out ATTR(Department) for a Category and it will work fine:

But if we use ATTR(Category) for Department then we get a bunch of *:

Does that help?

Jonathan

2 of 2 people found this helpful
1 2 Previous Next