3 Replies Latest reply on Feb 13, 2014 3:48 PM by Jonathan Drummey

    Counting Active IDs/Users

    Geoff Fawcett

      This is similar to the thread "Count In-Flight Events", but with enough of a twist that I can't figure it out (actually, I never figured THAT thread out in the first place... so maybe I'm just stuck conceptually here).


      Sample raw data looks something like this:


      Records with enddt=2099-12-31 are still active (as of query time).


      Goal is to get a month-by-month tally of "active users as of end of month".


      However, in this case, there are many historical start dates we don't need to see in results.  In sample data above, I don't need to see 2010 in my result dimension -- but I need to count the IDs that have old start dates that are still active (i.e.   ID A123).


      So, Calculating from above data, results are (attached Excel has the logic for counting in columns to right of original data.  Basically, an ID is active in a given month when:   startdt <= date_dim-month AND enddt > date_dim-month)


      ACTIVE ID Count43



      In reality, the backend data is around 50 million rows.  Historical start dates go back about 10 years

      But I only need to calculate and display IDs in the last 24 months. The source query already excludes records that are inactive prior to time I care about (enddt < 1/1/2011). That shrinks the result set down to about 40 million rows.


      I can't figure out how to have an independent time dimension (that functions as my columns), and get Tableau to run through all those rows and calculate whether each one is active across the months displayed. I tried creating a two-row dimension and using the padding technique, but couldn't figure out how to get the math to work.


      I've attached a packaged workbook with sampled data as well.


      Glad for any help provided ! Thanks.



        • 1. Re: Counting Active IDs/Users
          Joshua Milligan



          From a conceptual standpoint it is possible to solve this in Tableau.  I've attached a workbook that makes a start at it and at least demonstrates how you can calculate a date field that increments with the domain padded date so you can use it to calculate active/inactive.


          From a practical standpoint, I think you are going to want to consider ways to structure the data in the back-end to make it quite a bit easier and more efficient to use in Tableau.  Even in your example, with a couple hundred thousand records, I got a running out of memory warning when using domain padding and table calculations (notice that I filtered to a couple dozen records).  When you mention millions of records...


          ...remember all of this domain padding and table calculation is being done in memory in the cache.




          Would it be possible to write a simple script that would simply calculate the number of active IDs for each day and store that in a table?



          Here's a start to a Tableau option:




          You'll notice that I was able to use the start date, which doesn't include every date.  So, I checked the  "Show Missing Dates" option on the field.  Then, I had to create a new Date calculated field that incremented as the calculation moved across the table so I could compare that to the End Date to determine active status.


          The next step would be to add the active values for each column (I could use WINDOW_COUNT or WINDOW_SUM for that).  That would be the active count.  Then I could rearrange the view, hide things I didn't want, and I'd call it a day...


          ...except for the volume of data you have.  I just don't see this as a viable option.


          Hopefully, you can look at other options, including storing pre-calculated values in a table.



          For reference, here are some recent threads I am aware of that deal with domain padding:

          How can I show only the most recent value?

          distinct count of devices by version per day when devices do not have entry in table everyday

          Count the number of leave days taken by weekday (note some alternatives in the thread)




          1 of 1 people found this helpful
          • 2. Re: Counting Active IDs/Users
            Geoff Fawcett

            Thanks for the example. I'll have to study that to really get it.


            As for the data volume, I agree with you --- I think I need to let Teradata do the heavy lifting on this one.


            Thanks again,


            • 3. Re: Counting Active IDs/Users
              Jonathan Drummey

              To supplement what Joshua wrote, here's a link to a compendium of threads on the topic, with various ideas on how to do this in the database: http://community.tableau.com/thread/121505