1 2 Previous Next 18 Replies Latest reply on Nov 12, 2016 11:15 AM by George Qiao

    Running Distinct Sum or Total using COUNTD

    Prashant Raju

      Hi all,

       

      I know this has been discussed in previous threads but I'm finding it difficult to replicate solutions to solve my problem.

       

      I'm able to use the COUNTD([ClientID]) function to easily calculate the distinct amount of clients seen for a specific week which works perfectly. However, I'm also trying to represent on the same chart a running total (using table calculations) but Tableau doesn't seem to follow the same distinct count and simply sums up the distinct amount of clients. So at the moment my data looks like this in Tableau (using one city as an example):

       

      CityWeekUnique ClientsUnique Clients Running Total
      Melbourne06/12/200911
      Melbourne28/02/201045
      Melbourne23/05/201016
      Melbourne15/08/201028

       

      However, by 15/08/2010 there has only been 4 of the same client. Is there a way to make Tableau perform a running distinct sum or total using a calculated item which uses COUNTD?

       

      I have attached a packaged workbook which includes a sample set of data and where I have got to so far. It would be good that this would work with date quick filters. Any help would be appreciated!

       

      Thanks in advance!

        • 1. Re: Running Distinct Sum or Total using COUNTD
          Prashant Raju

          Hi Shawn,

           

          Sorry for not being clear enough.

           

          Screen Shot 2012-07-25 at 8.36.34 PM.png

          As you can see from the above screenshot the grand total of Unique Client # is 985, which is correct. There were 985 unique clients for the entire data set however when you look at the Running Sum of Unique Client # the last value is 994. Tableau is doing the right thing here, simply summing up each week but I'm trying to get Tableau to do a distinct running sum at / up to each specific week. Is that possible?

           

          Also, I should note that I'm using an Excel here as an example this will be connected to a Tableau Extract.

           

          Hopefully that makes sense.

          • 2. Re: Running Distinct Sum or Total using COUNTD
            Prashant Raju

            Hi Shawn

             

            Thanks anyway. Here's hoping to some table calc gurus seeing this post :)

            • 3. Re: Running Distinct Sum or Total using COUNTD
              Jonathan Drummey

              The goal is to show for each week a total of the distinct ClientIDs that includes all weeks leading up to that week. A RUNNING_SUM(COUNTD([ClientID])) doesn’t work (as seen in the Running Sum of CountD Fails) worksheet because the COUNTD() is evaluated for each week , so it double-counts 9 extra records by ClientID, as seen in the Grand Total at the bottom that shows 985 unique records, while the Running Sum comes up with 994 values.

                                  

              One workaround for some situations with COUNTD is to bring the dimension of interest (ClientID) into the view and then use COUNT instead. In order to prevent overlapping records, then we use some tricks with table calculations. I created the following calculations:

               

              Week of Date – when doing table calculations with dates, Tableau likes to pad rows and that can get messy, so I made my own calcs:

              (DATETRUNC('week',[Date])

               

              Week of Date String

              DATENAME('month',[Week of Date]) + " " + STR(DAY([Week of Date])) + ", " + STR(YEAR([Week of Date]))

              Now here are the other calcs:

               

              WC Count – this is a count of records per week, with the formula:

              IF FIRST()==0 THEN

                  WINDOW_COUNT(COUNT([ClientID]))

              END

              The Compute Using is an Advanced… Compute Using set to ClientID, sorted by Week of Date/Min/Ascending.

                                                                                                                                                                      

              WC Count for Running Sum – this generates the count of distinct ClientIDs up to the present week:

              WINDOW_COUNT(COUNT([ClientID]),FIRST(),0)

              The Compute Using is an Advanced… Compute Using with Client ID and Week of Date in the Compute Using window, sorted by Week of Date/Min/Ascending, and At the level Client ID. The last part is particularly necessary to set up the calc to return 985 instead of 994 as the maximum value.

               

              WM WC Count – the WC Count for Running Sum calc returns a set of values for each Client ID/Week of Date combination, in some cases it’s s an earlier value. Within a given week, the largest value of WC Count for Running Sum is the proper value.

              IF FIRST()==0 THEN

                  WINDOW_MAX([WC Count for Running Sum])

              END

              The Compute Using is an Advanced… Compute Using set to ClientID, sorted by Week of Date/Min/Ascending, with the nested Compute Using for WC Count for Running Sum set to the above.

               

              Hide Empty Rows – This is a workaround due to a problem I hadn’t encountered before (more details on that below):

              NOT ISNULL([WM WC Count])

              This returns True for the unneeded rows in WM WC Count.

               

              Now, to create the view:

               

              Put Week of Date String on the Rows Shelf, sorted by Week of Date/Min/Ascending.

              Put ClientID on Rows.

              Drag WC Count into the view, and set the Compute Using as above.

              Then do the same for WM WC Count, setting the nested Compute Using’s as above.

              Drag Hide Empty Rows onto the Filter Shelf, and select True.

              Uncheck Show Header for the ClientID pill on the Rows Shelf.

               

              See the attached for details, the Working View shows the result.

               

              A Problem

              The view as above works and can be seen in the Working View worksheet, however given table calculations like this I’m used to being able to not have ClientID on the Rows Shelf and can put it on the Level of Detail Shelf, the view works fine, and we don’t need anything like the Hide Empty Rows filter. However, on Tableau 7.0.6 when I do this, Tableau goes from instant redraws to taking 40+ seconds to Compute the View Layout, and when it does finally draw the view the WC Count for Running Sum Calc is broken. It is returning the same value as the WC Count calc, and therefore the WM WC Count calc is broken as well, since it’s returning the WM Count for Running Sum Calc. You can see this in the ClientID on LoD Fails worksheet.

               

              I can’t remember using At the Level in this sort of view, so I’m into unknown territory here. Usually when Tableau suddenly starts taking longer to redraw a view, there’s some sort of padding going on, so I’m wondering if there’s some sort of interaction between how At the Level works with where the field (ClientID) is in the view. Maybe Ross Bunker can help?

              • 4. Re: Running Distinct Sum or Total using COUNTD
                Prashant Raju

                Hi Jonathan,

                 

                Thanks for your reply, seems like you have done a fair bit of work on this! To get to the end it takes a lot of steps in between and with this being only a small sample of data and the compute the view layout I'm a bit worried it might loose the 'rapidness' of the dashboard.

                 

                I'm thinking of how I can solve this with transformed data rather than a direct connection to the database. Any ideas on how this can be done? I guess some SQL programming will come in handy here.

                 

                Thanks again,
                Prashant

                • 5. Re: Running Distinct Sum or Total using COUNTD
                  Jonathan Drummey

                  If you're dealing with less than 10's of thousands of rows, this should be fairly quick, and the calcs might be able to be optimized further. There are always N*X calculations, though, where N is the number of dates.

                   

                  I'm not exactly sure how you'd do this in SQL, the way I'd think of approaching it would be to start out with the dates, then join to a subquery that would do a COUNTD(ClientID) where the date would be from the first date to the current date.

                   

                  Jonathan

                  • 6. Re: Running Distinct Sum or Total using COUNTD
                    Ross Bunker

                    Hi Prashant and Jonathan,

                    Thanks for bringing this to my attention.  Sorry it's taken me a couple days to reply.  This was an interesting case.

                     

                    Short answer is, i have a solution that doesn't suffer from the problem you found.  The caveat here is that rolling up count distinct is really hard.  Tableau does nothing to help make it easier.  This is a great use case for us to look into for how we can improve the user experience for complex calculations.

                     

                    I've attached an workbook.  It has six sheets.  The first three illustrate the solution you provided (1,2) and the problem you found with it (3).  The next three show an alternate solution to the original count distinct question, which doesn't have the problem you found.  In the middle sheets I filtered to an interesting subset of dates to make it easier to see.  Oct 25, 2009 has a second instance of ClientID 17-562, so the distinct count is one less than the non-distinct count.

                     

                    On to the explanations.  First, the problem you found with your solution.

                    Basically, the issue is twofold.  When you move ClientID to the LOD shelf (or Columns for that matter), the calc you were using breaks.  This is shown clearly in the third sheet (Non-Working View)_ in the attached workbook. From the sheet, you can see that two 'unexpected' things are happening.  First, we are adding rows to cover the ClientID/Week of Date String (hereafter WoDS) combinations that didn't exist.  Second, we are partitioning on WoDS (that is the running count is starting over every time the WoDS changes). I put the "ClientID + Week of Date" set on the 'Non-Working View' after moving the ClientID so you can see how it restarts.  You can also see the extra rows added to the end of the viz (not in proper week order).

                     

                    Why are these happening?  This turns out to be very subtle, and there is a good reason for this behavior, although I'm not entirely convinced its the right thing.  The fact that I built it to work this way, but was completely baffled as to why until i looked deeply into the code is a clue that it's not really expected behavior.  I'll leave the gory-gory details for the end of the post.  Here i will just say that when you use At the level, we partition things in a different way by default that gives the behavior you want.  However, when you then move the 'at the level' field (in this case ClientID), to the LOD shelf, we change the behavior of partitioning (see below for more on why).  The same logic causes us to add the extra rows.

                     

                    So, I fixed this by using a different way to calculate the distinct count.  I used a formula like this:

                    IF (ATTR([ClientID]) == LOOKUP(ATTR([ClientID]), -1)) THEN

                      PREVIOUS_VALUE(0)

                    ELSE

                      PREVIOUS_VALUE(0) + 1

                    END

                     

                    This increments the count whenever the ClientID changes.  The important thing for this is sorting.  All rows with the same ClientID need to be together.  Furthermore, it also needs to be in the correct week order in order to have the count increase as the date increases.  That makes this tricky because you can't use Advanced... ClientID/WoDS sort by min(WoD).  That doesn't honor keeping ClientIDs together.  Instead you need to leave the sort as automatic, but make sure that you sort ClientID and WoDS on the sheet both by MIN(WoD) ASC.  Table calculations pick up the sort on the sheet by default (the 'Automatic' setting).  You can see this is correct because on the fourth sheet (Fixed View (w/ ClientID)), the first entry for October 25, 2009, ClientID 17-562 (out of strict ClientID order because it first apears in October 18, 2009, so when sorting by MIN(WoD) it appears before the other ClientIDs), has a Running Distinct Count of 21.  In the table, the previous value is 23 and the next is 24, so you can see how the table calc ordered this differently.

                     

                    So, this calc is more direct, though its not trivial.  As i said, this is a great use case for us to work on improving the experience.

                     

                    This ends the 'how do i do this' section.  Read on for the gory details on Partitioning.

                     

                    Partitioning and 'At the level':

                    Your calculation for running distinct used 'At the level' set to Client ID to compute your running distinct.  It was running along ClientID,WoDS.  We treat a collection of ordering fields in the 'advanced' dialog as a hierarchy.  (I'm questioning that behavior these days).

                     

                    For hierarchies, 'At the level' ClientID tells the table calc engine to actually treat WoDS as partitioning, but because it is treated as in a hierarchy with ClientID, it is partitioned on 'position' rather than 'value'.  To understand what that means, consider the following data values:

                    2005,Q1,Jan

                    2005,Q1,Feb

                    2005,Q1,Mar

                    2005,Q2,Apr

                    2005,Q2,May

                    2005,Q2,Jun

                     

                    Now, if you run a table calc along this Date hierarchy (that is along, Year, Qtr, Month), and you say 'At the level' Quarter, we want to partition on Month so that things like next/previous move to the next/previous quarter.  (think difference from previous quarter).  But if we strictly partition on Month we'd be in trouble because 2005,Q1,Jan and 2005,Q2,Apr would be in separate partitions (Jan != Apr).  Instead, when we partition a hierarchy we partition all the first children (Jan,Apr) together, and all the second children (Feb,May).

                     

                    So, your calc works because you are doing a count that is restarting for each time a client id appears in a new date.  With First() = 0 along WoDS, you are picking up the count for the first occurrence of each client id, and that's how you are getting your unique count.  That in and of itself is fine.

                     

                    However, there is a downside to using hierarchical partitioning with 'At the level'.  Weird things happen when you try to put levels of a hierarchy on different shelves.  Normally they are nested on the same shelf.  To compensate for these strange cases, table calculations try to account for this by changing partitioning behavior.  Specifically, if you put the ClientID on a different shelf from WoDS, we switch the partitioning from by position to by value.  This totally breaks your calculation because you want to count across all the dates.  Again, this behavior was originally put in place for cube hierarchies, to fix some strange cases.  I'm thinking we need to revisit this now that table calcs are in wider usage.

                     

                    :)ross

                    1 of 1 people found this helpful
                    • 7. Re: Running Distinct Sum or Total using COUNTD
                      Jonathan Drummey

                      Hi Ross,

                       

                      Thanks for the explanation and the fix! I think I'm starting to get a handle on what you mean by partitioning by position vs. partitioning by value. Are there any other posts you can point me to, or other examples you could share (when you get a chance)?

                       

                      Jonathan

                      • 8. Re: Running Distinct Sum or Total using COUNTD
                        lynda.kroeger

                        I have been attempting to do a running total of distinct customers across months.  I ran into the same problem as the original poster did.  I tried Jonathan Drummey solution as well as Ross's.  I was able to get the correct answer using Jonathan's solution, but could not accomplish it using Ross's.  Not sure what I am doing wrong.  What I need to be able to do is create a line chart showing the running sum of distinct customers across months.  I thought I would be able to do this after get the right numbers using JD solution, but I could not  get that to work on a line chart.  I also tried creating a line chart in the Tableau workbook Ross posted using the Fixed View tab, but that didn't work either.  Is it possible to create a line chart with a Running Sum of distinct values over time?  I've attached an   8.0 packaged workbook with what I came up.  Any help or feedback would be appreciated.  Sorry for hijacking the post. I could start a new post but I thought this was relevant to the original post.  Thanks

                        • 9. Re: Running Distinct Sum or Total using COUNTD
                          Fred Arve Fahre

                          I have exactly the same issue as Lynda - will this be improved in Tableau9? Or has anyone come up with a solution that makes this work as a running countd line graph across weeks/months? We have a KPI of distinct customers across year, and I really want to be able to show this year vs year in i line graph

                          • 10. Re: Running Distinct Sum or Total using COUNTD
                            Michael Strait

                            I took a different approach and, using a custom SQL statement, added the First_Contact_Date to each record. This field holds the date of the first record for each customer. I then created a calculated field IsFirstContact where Contact_Date = First_Contact_Date and added this as a filter to the worksheet that is intended to count unique customers.

                            Create another calculated field that is the sum of the IsFirstContact field and you can use that in a running total calc for your line chart.

                            • 11. Re: Running Distinct Sum or Total using COUNTD
                              Andrée Roos

                              Hi.

                               

                              I did something like this.

                               

                              RunningSumOfCountDCustomerNames.png

                              I haven't double checked the solution because my narcissistic instincts tell me it works.

                              I'm also lazy.

                              1 of 1 people found this helpful
                              • 12. Re: Running Distinct Sum or Total using COUNTD
                                Liubov Penyugalova

                                Thank you, briliant solution!

                                • 13. Re: Running Distinct Sum or Total using COUNTD
                                  Abdus Abdus

                                  Hi Jonathan,

                                  This is the dataset iam taking from the workbook you have attached.For example , on week of date Dec 28 2008 my running total is 112 but when i filter the week of date for "Dec 28 2008 iam not getting the correct value in running sum as you see in the other screenshot.

                                   

                                  Do you have any workaround for this, if yes then please let me know.

                                   

                                  Thanks

                                  • 14. Re: Running Distinct Sum or Total using COUNTD
                                    Yuriy Fal

                                    Hi Abdus Abdus,

                                     

                                    You could be using a Table Calc Filter like this:

                                     

                                    LOOKUP(ATTR([Date]),0)

                                     

                                    Compute using Cell.

                                    For a Date range filter make it Continuous (Green Pill).

                                     

                                    Hope this could help.

                                     

                                    Yours,

                                    Yuri

                                     

                                    PS If you're on a Tableau version 9.0+

                                    you could use an approach with LOD calcs

                                    as described in another thread below:

                                    Re: Running total(Accumulate) distinct number by day

                                    1 of 1 people found this helpful
                                    1 2 Previous Next