Sorry for not being clear enough.
As you can see from the above screenshot the grand total of Unique Client # is 985, which is correct. There were 985 unique clients for the entire data set however when you look at the Running Sum of Unique Client # the last value is 994. Tableau is doing the right thing here, simply summing up each week but I'm trying to get Tableau to do a distinct running sum at / up to each specific week. Is that possible?
Also, I should note that I'm using an Excel here as an example this will be connected to a Tableau Extract.
Hopefully that makes sense.
Thanks anyway. Here's hoping to some table calc gurus seeing this post :)
The goal is to show for each week a total of the distinct ClientIDs that includes all weeks leading up to that week. A RUNNING_SUM(COUNTD([ClientID])) doesn’t work (as seen in the Running Sum of CountD Fails) worksheet because the COUNTD() is evaluated for each week , so it double-counts 9 extra records by ClientID, as seen in the Grand Total at the bottom that shows 985 unique records, while the Running Sum comes up with 994 values.
One workaround for some situations with COUNTD is to bring the dimension of interest (ClientID) into the view and then use COUNT instead. In order to prevent overlapping records, then we use some tricks with table calculations. I created the following calculations:
Week of Date – when doing table calculations with dates, Tableau likes to pad rows and that can get messy, so I made my own calcs:
Week of Date String
DATENAME('month',[Week of Date]) + " " + STR(DAY([Week of Date])) + ", " + STR(YEAR([Week of Date]))
Now here are the other calcs:
WC Count – this is a count of records per week, with the formula:
IF FIRST()==0 THEN
The Compute Using is an Advanced… Compute Using set to ClientID, sorted by Week of Date/Min/Ascending.
WC Count for Running Sum – this generates the count of distinct ClientIDs up to the present week:
The Compute Using is an Advanced… Compute Using with Client ID and Week of Date in the Compute Using window, sorted by Week of Date/Min/Ascending, and At the level Client ID. The last part is particularly necessary to set up the calc to return 985 instead of 994 as the maximum value.
WM WC Count – the WC Count for Running Sum calc returns a set of values for each Client ID/Week of Date combination, in some cases it’s s an earlier value. Within a given week, the largest value of WC Count for Running Sum is the proper value.
IF FIRST()==0 THEN
WINDOW_MAX([WC Count for Running Sum])
The Compute Using is an Advanced… Compute Using set to ClientID, sorted by Week of Date/Min/Ascending, with the nested Compute Using for WC Count for Running Sum set to the above.
Hide Empty Rows – This is a workaround due to a problem I hadn’t encountered before (more details on that below):
NOT ISNULL([WM WC Count])
This returns True for the unneeded rows in WM WC Count.
Now, to create the view:
Put Week of Date String on the Rows Shelf, sorted by Week of Date/Min/Ascending.
Put ClientID on Rows.
Drag WC Count into the view, and set the Compute Using as above.
Then do the same for WM WC Count, setting the nested Compute Using’s as above.
Drag Hide Empty Rows onto the Filter Shelf, and select True.
Uncheck Show Header for the ClientID pill on the Rows Shelf.
See the attached for details, the Working View shows the result.
The view as above works and can be seen in the Working View worksheet, however given table calculations like this I’m used to being able to not have ClientID on the Rows Shelf and can put it on the Level of Detail Shelf, the view works fine, and we don’t need anything like the Hide Empty Rows filter. However, on Tableau 7.0.6 when I do this, Tableau goes from instant redraws to taking 40+ seconds to Compute the View Layout, and when it does finally draw the view the WC Count for Running Sum Calc is broken. It is returning the same value as the WC Count calc, and therefore the WM WC Count calc is broken as well, since it’s returning the WM Count for Running Sum Calc. You can see this in the ClientID on LoD Fails worksheet.
I can’t remember using At the Level in this sort of view, so I’m into unknown territory here. Usually when Tableau suddenly starts taking longer to redraw a view, there’s some sort of padding going on, so I’m wondering if there’s some sort of interaction between how At the Level works with where the field (ClientID) is in the view. Maybe Ross Bunker can help?
Thanks for your reply, seems like you have done a fair bit of work on this! To get to the end it takes a lot of steps in between and with this being only a small sample of data and the compute the view layout I'm a bit worried it might loose the 'rapidness' of the dashboard.
I'm thinking of how I can solve this with transformed data rather than a direct connection to the database. Any ideas on how this can be done? I guess some SQL programming will come in handy here.
If you're dealing with less than 10's of thousands of rows, this should be fairly quick, and the calcs might be able to be optimized further. There are always N*X calculations, though, where N is the number of dates.
I'm not exactly sure how you'd do this in SQL, the way I'd think of approaching it would be to start out with the dates, then join to a subquery that would do a COUNTD(ClientID) where the date would be from the first date to the current date.
Hi Prashant and Jonathan,
Thanks for bringing this to my attention. Sorry it's taken me a couple days to reply. This was an interesting case.
Short answer is, i have a solution that doesn't suffer from the problem you found. The caveat here is that rolling up count distinct is really hard. Tableau does nothing to help make it easier. This is a great use case for us to look into for how we can improve the user experience for complex calculations.
I've attached an workbook. It has six sheets. The first three illustrate the solution you provided (1,2) and the problem you found with it (3). The next three show an alternate solution to the original count distinct question, which doesn't have the problem you found. In the middle sheets I filtered to an interesting subset of dates to make it easier to see. Oct 25, 2009 has a second instance of ClientID 17-562, so the distinct count is one less than the non-distinct count.
On to the explanations. First, the problem you found with your solution.
Basically, the issue is twofold. When you move ClientID to the LOD shelf (or Columns for that matter), the calc you were using breaks. This is shown clearly in the third sheet (Non-Working View)_ in the attached workbook. From the sheet, you can see that two 'unexpected' things are happening. First, we are adding rows to cover the ClientID/Week of Date String (hereafter WoDS) combinations that didn't exist. Second, we are partitioning on WoDS (that is the running count is starting over every time the WoDS changes). I put the "ClientID + Week of Date" set on the 'Non-Working View' after moving the ClientID so you can see how it restarts. You can also see the extra rows added to the end of the viz (not in proper week order).
Why are these happening? This turns out to be very subtle, and there is a good reason for this behavior, although I'm not entirely convinced its the right thing. The fact that I built it to work this way, but was completely baffled as to why until i looked deeply into the code is a clue that it's not really expected behavior. I'll leave the gory-gory details for the end of the post. Here i will just say that when you use At the level, we partition things in a different way by default that gives the behavior you want. However, when you then move the 'at the level' field (in this case ClientID), to the LOD shelf, we change the behavior of partitioning (see below for more on why). The same logic causes us to add the extra rows.
So, I fixed this by using a different way to calculate the distinct count. I used a formula like this:
IF (ATTR([ClientID]) == LOOKUP(ATTR([ClientID]), -1)) THEN
PREVIOUS_VALUE(0) + 1
This increments the count whenever the ClientID changes. The important thing for this is sorting. All rows with the same ClientID need to be together. Furthermore, it also needs to be in the correct week order in order to have the count increase as the date increases. That makes this tricky because you can't use Advanced... ClientID/WoDS sort by min(WoD). That doesn't honor keeping ClientIDs together. Instead you need to leave the sort as automatic, but make sure that you sort ClientID and WoDS on the sheet both by MIN(WoD) ASC. Table calculations pick up the sort on the sheet by default (the 'Automatic' setting). You can see this is correct because on the fourth sheet (Fixed View (w/ ClientID)), the first entry for October 25, 2009, ClientID 17-562 (out of strict ClientID order because it first apears in October 18, 2009, so when sorting by MIN(WoD) it appears before the other ClientIDs), has a Running Distinct Count of 21. In the table, the previous value is 23 and the next is 24, so you can see how the table calc ordered this differently.
So, this calc is more direct, though its not trivial. As i said, this is a great use case for us to work on improving the experience.
This ends the 'how do i do this' section. Read on for the gory details on Partitioning.
Partitioning and 'At the level':
Your calculation for running distinct used 'At the level' set to Client ID to compute your running distinct. It was running along ClientID,WoDS. We treat a collection of ordering fields in the 'advanced' dialog as a hierarchy. (I'm questioning that behavior these days).
For hierarchies, 'At the level' ClientID tells the table calc engine to actually treat WoDS as partitioning, but because it is treated as in a hierarchy with ClientID, it is partitioned on 'position' rather than 'value'. To understand what that means, consider the following data values:
Now, if you run a table calc along this Date hierarchy (that is along, Year, Qtr, Month), and you say 'At the level' Quarter, we want to partition on Month so that things like next/previous move to the next/previous quarter. (think difference from previous quarter). But if we strictly partition on Month we'd be in trouble because 2005,Q1,Jan and 2005,Q2,Apr would be in separate partitions (Jan != Apr). Instead, when we partition a hierarchy we partition all the first children (Jan,Apr) together, and all the second children (Feb,May).
So, your calc works because you are doing a count that is restarting for each time a client id appears in a new date. With First() = 0 along WoDS, you are picking up the count for the first occurrence of each client id, and that's how you are getting your unique count. That in and of itself is fine.
However, there is a downside to using hierarchical partitioning with 'At the level'. Weird things happen when you try to put levels of a hierarchy on different shelves. Normally they are nested on the same shelf. To compensate for these strange cases, table calculations try to account for this by changing partitioning behavior. Specifically, if you put the ClientID on a different shelf from WoDS, we switch the partitioning from by position to by value. This totally breaks your calculation because you want to count across all the dates. Again, this behavior was originally put in place for cube hierarchies, to fix some strange cases. I'm thinking we need to revisit this now that table calcs are in wider usage.
RunningDistinctTotal.twbx.zip 121.6 KB
Thanks for the explanation and the fix! I think I'm starting to get a handle on what you mean by partitioning by position vs. partitioning by value. Are there any other posts you can point me to, or other examples you could share (when you get a chance)?
I have been attempting to do a running total of distinct customers across months. I ran into the same problem as the original poster did. I tried Jonathan Drummey solution as well as Ross's. I was able to get the correct answer using Jonathan's solution, but could not accomplish it using Ross's. Not sure what I am doing wrong. What I need to be able to do is create a line chart showing the running sum of distinct customers across months. I thought I would be able to do this after get the right numbers using JD solution, but I could not get that to work on a line chart. I also tried creating a line chart in the Tableau workbook Ross posted using the Fixed View tab, but that didn't work either. Is it possible to create a line chart with a Running Sum of distinct values over time? I've attached an 8.0 packaged workbook with what I came up. Any help or feedback would be appreciated. Sorry for hijacking the post. I could start a new post but I thought this was relevant to the original post. Thanks
disctinct_customers.twbx 108.6 KB
I have exactly the same issue as Lynda - will this be improved in Tableau9? Or has anyone come up with a solution that makes this work as a running countd line graph across weeks/months? We have a KPI of distinct customers across year, and I really want to be able to show this year vs year in i line graph
I took a different approach and, using a custom SQL statement, added the First_Contact_Date to each record. This field holds the date of the first record for each customer. I then created a calculated field IsFirstContact where Contact_Date = First_Contact_Date and added this as a filter to the worksheet that is intended to count unique customers.
Create another calculated field that is the sum of the IsFirstContact field and you can use that in a running total calc for your line chart.
3 of 3 people found this helpful
I did something like this.
I haven't double checked the solution because my narcissistic instincts tell me it works.
I'm also lazy.
Thank you, briliant solution!
This is the dataset iam taking from the workbook you have attached.For example , on week of date Dec 28 2008 my running total is 112 but when i filter the week of date for "Dec 28 2008 iam not getting the correct value in running sum as you see in the other screenshot.
Do you have any workaround for this, if yes then please let me know.
1 of 1 people found this helpful
Hi Abdus Abdus,
You could be using a Table Calc Filter like this:
Compute using Cell.
For a Date range filter make it Continuous (Green Pill).
Hope this could help.
PS If you're on a Tableau version 9.0+
you could use an approach with LOD calcs
as described in another thread below: