9 Replies Latest reply on Apr 6, 2015 6:26 AM by Sanjay Reddy

    Table issues

    Sanjay Reddy

      I have an issue with Tableau doing a running total of active members. I am attaching a sample workbook with sample data. My question is about data padding and figuring out how can it be done in tableau, without creating extra rows when pulling in the data. I am trying to introduce a calendar table to my original data. I have a Data table with id,start date,end date (lets assume they all start at first of every month and end of a month). I want to introduce a date table so i have a start date table and end date table (I know i could have joined just using one table that has start date column and end date column, but then i run into the problem creating a single row for each id for each month its active. Which create a lot of rows, as i am dealing with a large database).


      My question now is, I want to join the data table with a start table like data.startdate = startdate and same with end date. But when i do this I only seem to get either the start date or end date, but not exactly when are active between both dates.


      If you have more questions regarding this please let me know. I know this might sound confusing, but since i am dealing with large dataset its hard to use a method like joining a separate calendar all together because like i mentioned it creates many more rows. Like going from 20 million records to 120 million records. So finally to make it simple, I want to know how to introduce a calendar table into my data without creating additional rows and I need to show exactly when they are active between. Thanks.


      Edit: Sorry everyone forgot to post the result i was getting and what i was expecting. Updated my workbook. Anyhelp would be appreciated.

        • 1. Re: Counting active members
          Jonathan Drummey

          Hi Sanjay,


          Here's a link to a number of different solutions: Utilization, Queues, Census, Throughput for Blocks, Floors, Units, Rooms, Offices, etc. - What do you do?


          See the attached, I put together a couple of options, both based on the same idea. I took your raw Data sheet and used Custom SQL to UNION it to itself to create a "tall" table, then made a couple of calculated fields. One is a date field with the appropriate Start/End Date, the second is a counter with values +1 (for a Start Date) or -1 (for an end date). Then the # of active users is just a running sum on the counter:



          However, if we graph that as a line chart then it gets a little messy:




          One way to get something that has more vertical lines in it is to use domain padding, which we can do by turning on Show Missing Values for the Date dimension:



          Note that these lines might look vertical, but they aren't. We'd need to do another round of duplication of the data to draw exact lines, there are links about stepped charts in Tableau in my link above. Also, if you try filtering on the Date dimension it'll be messy because the Filter will be on the raw data, not the padded data from Show Missing Values. There are ways around that with further calculations.


          Another way is to use a scaffold source and a data blend. I created a long list of days, then used that as a primary source with the unioned data as the secondary. Then the same running_sum can work,but now it's computed based on the dimensionality of all the days. However, it still has the same problems with respect to filtering the dates, which might not be an issue for you.



          • 2. Re: Counting active members
            Sanjay Reddy

            Hi Jonathan,


            Thanks for your reply regarding my question. I was glued to that link you sent me. But i still can't figure out, how to count without using domain padding. Because i need count of number of people active rather than a line. I even tried  union sql from one of the forums i believed you answered. Looking at the simple counter below and excel data we can see its not right. As I would be looking with monthly granularity it should show two ppl still active on 1/31 and two ppl for 3/31. If i use the missing values i would end up getting all the dates like 1/1,1/2.1/3 but i am just looking for monthly.


            simple counter.PNG Excel Data.PNG


            This is exactly what i am looking for, i was able to get this using the links you had from the discussion. But i want this in just a textview rather a line. This is where I am having issues.

            Domain Padding.PNG



            So my question is kind of confusing (I haven't done a great job explaining it either, as i don't think its easy to explain).

            I will try again hopefully I can do a better job. So I can get the results you have using your way of custom sql, but i need the actual running count just as text even for the missing dates.


            I don't think its possible in tableau without giving the data or doing a join where each member id has a record for when it started and ended. I am attaching another tableau workbook to show you how i did it. But doing that method has created a lot of extra records because of join. But its not possible in a real world database situation which has millions of records (going from like 20 million to 120 million) . Before this we even used date parameters inside tableau to get results, but we wanted to use calendar that's where things are stuck.


            If you look at the workbook below, you would understand my problem when you view its data and how it just increases massively. The workbook below is exactly what i am looking for, without the exponential record increase. Hopefully i did a better job in explaining the problem. Again thank you for taking time to look into this.

            • 3. Re: Counting active members
              Jonathan Drummey

              The data in the workbook you just attached has more records than necessary, it's padded out in the data source and yes, you're getting 6x the number of records (even higher if customers are around longer).


              What domain padding (turning on Show Missing Values) enables is for the padding to be done in Tableau, so all that's necessary is 2x the records so we can count the start and end***. Then any necessary padding in Tableau is based on the level of the date dimension in the view, so at the month level across a year there are only 12 marks (not 20M+ records).


              To set up the text table, I would have had to copy out and restructure your data from your latest workbook, so this example is from the workbook I just posted. Here's how to build a month by month view text table:


              1. Do the query to UNION the data sources together.

              2. Build the calculated fields for the Date and Counter.

              3. Put MY(Date) on columns.

              4. Put SUM(Counter) on the Text Shelf. Here's the view so far:


              5. Change SUM(Counter) into a Running Sum and set the Compute Using to the Date.

              6. Click on the MY(Date) pill and turn on Show Missing Values:



              *** However, this is actually less performant than it could be. There are several ways to avoid having to deal with 20M*2 records in Tableau (that get ultimately get aggregated and potentially padded to one mark per month) and instead only have a data source that has 24 records per year. For example, you can aggregate to the level of month before you do the UNION, for example here's a pseudocoded query using two UNIONs and a couple of left-joins.


              SELECT Month.Month

                  ,COUNT(Starts.MemberID) AS startCount

                  ,"Start" AS Copy

              FROM Months

              LEFT JOIN myData on myData.Month of Start Date = Months.Month AS Starts

              UNION ALL

              SELECT Month.Month

                  ,COUNT(Ends.MemberID) AS encCount

                  ,"End" AS Copy

              FROM Months

              LEFT JOIN myData on myData.Month of End Date = Months.Month AS Ends


              That returns 24 records per year to Tableau and wouldn't require any domain padding, just a running sum calculation that is something like RUNNING_SUM(ZN([startCount])-ZN([endCount])). If you needed more dimensions for filtering, then they could be added to the padding.



              • 4. Re: Counting active members
                Sanjay Reddy

                Hi Jonathan,


                Really appreciate you taking the time to give me further advice on this. But again I have to go back (Sorry if i am missing something). If we take look at the result vs the data I have. I am again not getting the right result. Running count.PNG Excel Data.PNG


                For the month of January I should get 2 active members according to my excel, but I only get 1 active member. I understand what is happening here for Member ID 1 its doing 1-1 = 0, so we end with only a 1 active member as a result from Member ID 2. How would i go about changing this to show 2 members active for January. I want it to show as active for that particular month even it ended that month. Screenshot of what i would like the result to be. Thanks again for the help.


                needed result.PNG

                • 5. Re: Counting active members
                  Jonathan Drummey

                  I can't tell if you're missing something without seeing what you are doing, can you share a packaged workbook?


                  I've got an idea about the counting, I'll have to test it out first.



                  • 6. Re: Counting active members
                    Sanjay Reddy

                    Hi Jonathan (I probably confused you a lot with my explanations of how i wanted the result). The final result of the screen shot is just a mockup of what I am expecting. I do have different workbooks, as i have been trying out different scenarios and trying to see which one would work.


                    Current_Sample_New.twbx is what i did at first, but then realized it can't work in real world.


                    Sample_Calender.twbx is where i joined a separate start table and end table to the data to only get 1 row for each id.


                    Sample_Calendar jtd.twbx is the one you sent me using custom sql using union to create 2 rows for each record. I was comparing the results you have from there to my expected results.


                    Excel file has data with different types of dates tables (If needed).


                    Hopefully I made it better to understand now. Again thanks for looking in to this.

                    • 7. Re: Counting active members
                      Jonathan Drummey

                      I wanted to try to reflect what I'm seeing:


                      1) There are 20+ million raw records, so a goal is to not end up with any more records than absolutely necessary.

                      2) The ultimate goal is to end up with a view that counts any activity in the month, so the month of start date and month of end date both count, along with all the months in between. Another way to describe this is that instead of counting month boundaries, the goal is to count the months. It's essentially the Fencepost problem: http://betterexplained.com/articles/learning-how-to-count-avoiding-the-fencepost-problem/.


                      I'll do #2 first. Another way of describing this is that for the running total we count users who ended this month in this current month (since they had activity) and then subtract them from the *following* month (since they had no activity). In the UNIONed data source, I created a new Month calculated field with the following formula:


                      DATE(CASE [copy]

                          WHEN 1 THEN DATETRUNC('month',[Start Date])

                          //adding a month to the end date to only subtract after

                          WHEN 2 THEN DATEADD('month',1,DATETRUNC('month',[End Date]))



                      Then when putting that in a view using MY(Month) as the dimension and turning on Show Missing Values I get the same total as your desired example:



                      This is the "domain padding 2" worksheet in the attached.


                      For #1, my suggestion is to (if possible) pre-aggregate to the month level before the data gets to Tableau, as I showed in the prior pseudo-query. You could even do the offset of the ending month that I did in the Tableau calculated field in the data source, so there would be even less manipulation in Tableau. So at worst (most) you'd have 2*records, at best you could have as few as 24 per year.



                      1 of 1 people found this helpful
                      • 8. Re: Counting active members
                        Sanjay Reddy

                        Hi Jonathan, thanks a lot for that taking time to let me know of the count issues (that link really helped me better understand). I was trying to figure out how can i possibly do this with just one row of record for each member, but it doesn't look like that's possible. So I think your option of using union is the best for this scenario.


                        I still have one more question (if you don't mind). Attaching a cleaned up workbook, I just want to understand if my workbook meets my own requirements. I wanted to change some filters to better meet my requirements.

                        When i used the range filter of months, it doesn't display the missing values which it does without the filter. Again thanks for the help.


                        edit: sorry i think i attached the wrong workbook. Updated it now.

                        • 9. Re: Counting active members
                          Sanjay Reddy

                          Hi Jonathan (Sorry to bother you this much). I just wanted to get your opinion regarding my scenario. We really want to introduce a date calendar in our workbook. Do you think since we have 20+ million records it's just better to do the custom sql with UNION ALL and then count active members. If I do this i seem to be getting duplicates getting counted too, is there anyway to change that?  I tried to use member count, but it doesn't seem to be working. Thanks.