8 Replies Latest reply on May 3, 2012 10:12 AM by Jonathan Drummey

    calculate count distinct according to time

      I have a table stures the users that installed my product and the date when he installed., the table contains data like

       

      user name     |     install date     |

      ---------------------------------------------------

      user1             |     Jan 1            |

      user2             |     Jan 1            |

      user3             |     Jan 1            |

      user3             |     Jan 1            |

      user4             |     Feb 1           |

      user3             |     Feb 1           !

      user2             |     Feb 1           |

       

      In Tableau I want to show how many users have installed my product every day.

      I want to draw a line chat which shows how many users have installed my product. The column is date and row is the number of users that have installed my product. For example, on Jan 1, the report should display that 3 users have installed the my product; on Feb 1, the report should show 4 users have installd.

       

      I tried to use countd, but I can only show the distinct users every day. I cannot accumulate them.

        • 1. Re: calculate count distinct according to time
          Alex Kerin

          This should be straightforward, but I cannot get my head around it.

           

          I can get part of the way there with: lookup(attr([Install]),first())

           

          but then cannot think of a way to sum the values (you can't partition a table calc by another). Maybe Richard Leeke or someone else can help.

           

          I tried using a filter to get rid of the other appearances of the id after the first date, but to no avail either.

          • 2. Re: calculate count distinct according to time

            There's a solution in

            http://community.tableau.com/message/168597#168597

             

            I have downloaded the twbx file. But I don't know how he did it. I tried to create the calculated fields as he did. But I got a mass in my result. I am trying to check what I missed

            • 3. Re: calculate count distinct according to time
              Jonathan Drummey

              Hi Chandler,

               

              Attached is an implementation of Joe Mako's calcs using a variation of Alex's sample data that includes some duplicate records to make sure things work right.

               

              Instead of trying to use a count distinct that would have to be compared, what Joe set up is:

               

              1. Identify the earliest month for each user in the data set - that's the Min Month calc, for each of those return 1 no matter how many records there might be, return 0 for every other month/everyone else. This effectively gets the "distinct" part of the count distinct.

              2. Sum up the Min Month for each month - that's the Min Month calc, so now we have an accurate count of new users in the month.

              3. Create a running sum Min Month - that's the Running Sum calc.

               

              Since you're wanting a count over months, I took the Install (date) field and used the DATETRUNC function to create a Month of Install field. This makes setting the Compute Using for the table calculations easier.

               

              I'm guessing that the problem you are experiencing in your results is in the addressing and partitioning of the (nested) table calculations. They have to be exact or else the calculation fails. In this case, there are three calculations to be set when you go to the Edit Table Calculation on the Running Sum field, by clicking on the Calculated Field combo box: Running Sum, Month Sum and Min Month.

               

              screenshot1.jpg

               

              Starting with the lowest level, Min Month needs the Compute Using set to Month of Install. This calculates using a method I hadn't seen before, I'm going to have to add it to my toolbox. Using TOTAL is very cool because it doesn't care about sorting in the partition, which is something that can drive me batty in nested table calcs.

               

              Then the Month Sum needs the Compute using set to ID. This simply sums up the # of Min Months for each month.

               

              Finally, the Running Sum needs an Advanced... compute using with *both* Month of Install and ID in the Compute Using, with the sort set to Month of Install/Min/Ascending. This then calculates across all combinations of Month of Install/ID, in the correct order.

               

              Jonathan

              • 4. Re: calculate count distinct according to time

                Attached would be my approach. Let me know if it helps.

                 

                Basic idea being to just put the "Number of Records" measure on the rows shelf, "install date" on the columns shelf.

                • 5. Re: calculate count distinct according to time
                  Alex Kerin

                  NICE! That total(min is the key that I had missed.

                  • 6. Re: calculate count distinct according to time
                    Alex Kerin

                    The issue is that once someone has installed it, another entry for them at a later date should not be counted. Hence the complexity with the Table Calcs.

                    • 7. Re: calculate count distinct according to time

                      Oh, yes, you're right. I was reading it as installs per day regardless of which user installs. So then why wouldn't COUNTD ([user name]) be sufficent?

                      • 8. Re: calculate count distinct according to time
                        Jonathan Drummey

                        Hi Derek,

                         

                        The goal as I understand it was to count the distinct (new) installs for each day, and since the data doesn't flag which rows belong to a new install or not, regardless of user, we have to generate that from the data. In the sample data that Chandler initially posted, a COUNTD(username) would return 3 on Jan 1 and 3 on Feb 1, and the goal is to show 3 on Jan 1 and 4 for Feb 1 (the 3 from Jan 1 and the single new user on Feb 1).

                         

                        Jonathan