10 Replies Latest reply on Sep 6, 2018 12:58 PM by Alex Kerin

    Distinct active users in prior 90 days for a given week

    Alex Kerin

      OK, using the Superstore as an example.

       

      Week of Order DateActive Customers in the prior 90 days
      August 26th 201885
      August 19th 2018107
      August 12th 201876
      ...
      ....

       

      An active customer is someone who has ordered anything (has an Order ID associated with their name) in the prior 90 days from the end of the Week of Order Date. So this is a rolling window of 'active' customers. Now, a customer of course should still appear in that week's count even if they did not order in that Week, but had ordered at some period less than 91 days from that week. This rolling count concept has been covered many times before - I even participated in the early discussions using pure window calcs when Joe Mako and Jonathan Drummey taught us about data densification and more. Then LOD calcs in combination with running_sum window calcs came along. Unfortunately I had taken a turn to middle-management and managed to miss all of the LOD learning period and cannot work out how to modify those for my needs..

       

      Hoping someone can help.

       

      Edit: this is my 90 day active calc that is not a running window:

       

      if {fixed [Customer Name]:countd([Order ID])}>0

          and {fixed [Customer Name]:min(DATEDIFF('day',[Order Date],today()))}<=90

          then "Active Customer" end

        • 2. Re: Distinct active users in prior 90 days for a given week
          Alex Kerin

          Appreciated, that's one of the threads I was part of and it predates LOD apart from the latter answers, which I don't understand or I feel don't match my exact needs.

          • 3. Re: Distinct active users in prior 90 days for a given week
            Jonathan Drummey

            Hi Alex,

             

            The way LOD expressions are implemented in Tableau is as aggregations (often computed as subqueries) at the given level of detail. The key point here is that each record is only counted once - records are grouped based on the level of detail and then aggregated. Whereas something like a moving count distinct requires counting records up to N times, where N is number of marks (number of weeks in your case).

             

            In the aforementioned running count distinct here and there people have written that they've solved the moving count distinct problem with LOD expressions but they haven't (except for one case described below in method c); all the moving count distinct LOD "solutions" I've seen have actually answered different questions - either aggregating LODs over a different window (so still counting each record only once) or creating a running count distinct (which is also still counting each record only once). So part of the challenge here is simply defining the particular computation that one is trying to make and when we've created a solution identifying what class of questions that solution actually answers.

             

            Now another way to look at this is from a data structure perspective:

             

            1) The original raw data is at the grain of transactions (orders).

            3) The desired viz is at the grain of the given date period (week in this case) where that date period is inclusive of all transactions in a larger period (90 days in this case) for each higher level entity (customer in this case).

            2) In-between what would be "easiest" is a data set that had all the customers (from the past 90 days) we want to count for each date period (week), i.e. replicating the raw data as necessary.

             

            Another complication can be when the data is sparse and not every date period (week) exists in the data, so somewhere between 2) and 3) we'd need to do some padding or data densification.

             

            So the methods the community has come up with to do this are:

             

            a) Do step 2 (and maybe step 3) in SQL. My understanding is that some DBs support the necessary window functions to simplify or skip step 2, other options are to do cross joins in SQL. @Richard Leeke wrote about the cross join solution.

             

            b) Do step 2 (and maybe step 3) in a data preparation tool such as Tableau Prep or Alteryx. This is my preferred method nowadays for ease & speed of use vs. SQL. I recently did a demo of this in Tableau Prep, I can dig it up if need be.

             

            c) @Yuri Fal came up with a method where you can write N LODs (one for each of the N times) inside a calculation and then return the right one. So this trades off the data replication of step B for lots of calculations. This might work for 7 days in a week or 12 months in a year but will eventually cause performance issues, I'm thinking 52 weeks wouldn't work. Also this solution requires any padding of the data for sparse data to be done outside of Tableau because Tableau's data densification is only performed after LODs are computed.

             

            d) Load all (or most of) the data from step 1 into a viz in Tableau, densify the data if necessary in Tableau (or do the padding outside of Tableau), and use table calculations to skip step 2. This is what @Joe Mako came up with. The issue with this solution is that the viz is at the level of detail of the date period + customer and the combination of those could be prohibitively large for the desired performance of the viz.

             

            Jonathan

            • 4. Re: Distinct active users in prior 90 days for a given week
              Yuriy Fal

              Hi friends,

               

              As for the methods a) or b), there are times when

              a Range Join with the Calendar table could help.

               

              Alex, in your particular case it would be

              a weekly calendar, and a 13-week date range.

               

              Since it would be a non-equi join,

              the HyPer is here to help (hence version 10.5+).

               

              Please find the attached.

               

              Yours,

              Yuri

               

               

              PS  The method c) would require 13*2+1 == 27 LOD-based calculations :-)

              • 5. Re: Distinct active users in prior 90 days for a given week
                Alex Kerin

                Jonathan, always a pleasure to read your replies. I don't believe data densification will be needed - there will always be plenty of orders in any given week. Prep is out as this will be an ongoing process of reporting (until I can schedule Prep jobs, preferably on Server..). It's interesting that almost 6 years on from our forays into this, that there still isn't a concise, expedient way to achieve this. I'm on  MYSQL, so possible that a window solution may work. I also have a SQL expert to hand, so maybe even custom SQL..

                 

                 

                I will look at Yuri's solution as well. Thanks, Alex.

                • 6. Re: Distinct active users in prior 90 days for a given week
                  Alex Kerin

                  Thank you Yuri - I like this. Given the likely rapid expansion of our customers, I don't know if it will be sustainable in the future. However, for now, perhaps it works. Could you give an example of the LOD calcs (perhaps for a three week period)?

                  • 7. Re: Distinct active users in prior 90 days for a given week
                    Yuriy Fal

                    Hi Alex,

                     

                    Please find the attached a 3-week version

                    (both X-Join and LOD-based ways to compare).

                     

                    Yours,

                    Yuri

                    2 of 2 people found this helpful
                    • 8. Re: Distinct active users in prior 90 days for a given week
                      Jonathan Drummey

                      Thanks for jumping in, Yuri!

                       

                      Alex - Dunno if you knew, you can now schedule Prep through Windows Task Scheduler, see New in Tableau Prep: Now in seven languages, plus command line flow scheduling | Tableau Software for an intro. I've had a couple of conversations this summer with the devs around moving count distinct and the larger question of "the raw data is in structure X and we need structure Y for our analytics".  I know the devs' vision is to have it "just work".

                       

                      Jonathan

                      • 9. Re: Distinct active users in prior 90 days for a given week
                        Alex Kerin

                        There you go - nice. Next step Server I guess.

                        • 10. Re: Distinct active users in prior 90 days for a given week
                          Alex Kerin

                          Yuri - great job, thank you - I also appreciate the naming convention of your calculated fields.