6 Replies Latest reply on Sep 16, 2013 2:54 PM by Patrick A Van Der Hyde

    Data Summarization not working

    Swathi C

      Hi,

       

      I've been struggling with this problem for a few days now. I want to calculate the #Net Paying Customers for each Line of Business (eg: books, ebooks). I've created a sample excel workbook as follows:

       

      Monthuser_idbook_revebook_rev
      4/1/201314-1
      4/1/2013210
      5/1/2013120
      5/1/20132-21

       

      I want to calculate the net paying customers for each line of business for each month and for both the months put together: Basically, i want to calculate the total revenue for each LOB for the time period I'm interested in and then count(#customers with >0 total revenue) - count(#customers with <0 total revenue). More detail is available in the SQL queries at the bottom of this post.

       

      Output should be:

                          Books_net_paying_customers     Ebooks_net_paying_customers

      April, 2013               2                                                    -1

      May 2013                0                                                     1

      April + May 2013     0                                                      0

       

      Let me explain my April + May numbers better.

       

      The total book revenue for User ID 1 is 6 during this period. Total Book revenue for User ID 2 is -1. I summed up the revenues in April and May here. As a result, the #Net paying customers = 1 (since User_id 1 has >0 revenue and we don't care about customers with revenue = 0). Similarly, User ID 1 has -1 net revenue in April + May. User ID 2 has +1 net revenue in April + May. As a result, #Net Paying customers = 1 (paying customer) - 1 (refund customer) = 0.

       

      Any help on this would be appreciated!! I've attached my current workbook and my efforts on this so far can be seen here.

       

      [

      The month level view can be calculated as follows:

       

      select Month,books_paying_customers - books_refund_customers, ebooks_paying_customers - ebooks_refund_customers

      from (

      select Month, count(distinct case when bookrev > 0 then user_id end) as books_paying_customers,

           count(distinct case when bookrev < 0 then user_id end) as books_refund_customers,

           count(distinct case when ebookrev > 0 then user_id end) as ebooks_paying_customers,

           count(distinct case when ebookrev < 0 then user_id end) as ebooks_refund_customers

          from

          (

               select Month, user_id, sum(book_rev) as bookrev, sum(ebook_rev) as ebookrev

               from table

               group by 1,2

           ) a

           group by 1

      ) b

       

      The view for April + May can be calculated as follows:

       

      select books_paying_customers - books_refund_customers, ebooks_paying_customers - ebooks_refund_customers

      from (

      select count(distinct case when bookrev > 0 then user_id end) as books_paying_customers,

           count(distinct case when bookrev < 0 then user_id end) as books_refund_customers,

           count(distinct case when ebookrev > 0 then user_id end) as ebooks_paying_customers,

           count(distinct case when ebookrev < 0 then user_id end) as ebooks_refund_customers

          from

          (

               select user_id, sum(book_rev) as bookrev, sum(ebook_rev) as ebookrev

               from table

               group by 1

           ) a

      ) b

       

       

       

      ]

       

      (Added packaged workbook and edited my post for more details)

        • 1. Re: Data Summarization not working
          Joshua Milligan

          Swathi,

           

          I'm not sure if I follow the expected output.  Shouldn't book_rev for April + May be 1, not 0?  user_id 1 showed revenue for both months.

           

          I think you are looking for a couple of calculated fields, something like:

           

          COUNTD(IF [book_rev] > 0 THEN user_id END) - COUNTD(IF [book_rev] < 0 THEN user_id END)

          and

          COUNTD(IF [ebook_rev] > 0 THEN user_id END) - COUNTD(IF [ebook_rev] < 0 THEN user_id END)

           

          I've attached a workbook that might help the conversation get started... but if I've not understood the issue, you will want to follow Shawn's advice and post a packaged (twbx) workbook -- not a twb.

           

          Regards,

          Joshua

          • 2. Re: Re: Data Summarization not working
            Swathi C

            Hi Joshua,

            Thanks for having a look at my post! I've attached my packaged workbook to this post. Let me know if this doesn't work for you.

             

            Let me explain my April + May numbers better.

             

            The total book revenue for User ID 1 is 6 during this period. Total Book revenue for User ID 2 is -1. I summed up the revenues in April and May here. As a result, the #Net paying customers = 1 (since User_id 1 has >0 revenue). Similarly, User ID 1 has -1 net revenue in April + May. User ID 2 has +1 net revenue in April + May. As a result, #Net Paying customers = 1 (paying customer) - 1 (refund customer) = 0.

             

            Does this help you?

            • 3. Re: Re: Re: Data Summarization not working
              Joshua Milligan

              Swathi,

               

              That makes sense now.  I worked on this a bit and came up with a solution using sets (to determine whether the user is net paying/refund overall) and a custom grand total trick (see the blog posts starting with Part 1: Customizing Grand Totals – Part 1 | Drawing with Numbers).

               

              I haven't included a lot of documentation, but I'd be happy to provide any explanation as needed!

               

              Regards,

              Joshua

              1 of 1 people found this helpful
              • 4. Re: Re: Re: Data Summarization not working
                Swathi C

                Hi Joshua,

                 

                That sounds great!! However, I use Tableau 7 and it looks like you use Tableau 8. Is it possible to do the same thing in Tableau 7? If you're not sure, can you help me provide more explanation as to what you did here?

                 

                Also, just to clarify: user_id cannot be a field in the tableau table. If your solution doesn't need to do this and can work in Tableau 7, it will definitely work for me! If it doesn't, I can still go back to my stakeholders and explain when and why this can't be done. This is helpful in any case.

                 

                Thanks,
                Swathi

                • 5. Re: Re: Re: Data Summarization not working
                  Joshua Milligan

                  Ah!  Well... how soon until you upgrade?

                   

                  I don't believe my approach will be possible in 7 as sets weren't really robust until 8.  I'll have to give it some more thought to see if there is a good approach for 7.  I had originally started down a table calculation rout and may return to that.

                   

                  Another possibility that might be much simpler would be to create two views.  One by month and then another for the combined months.  Then combine the two views in a dashboard.  I'll try to get to that soon as well.  But in the meantime, maybe there are some others in the community who'd like to take a swing at it?

                   

                  Regards,

                  Joshua

                  • 6. Re: Data Summarization not working
                    Patrick A Van Der Hyde

                    Swathi also requested assistance via Support so I am adding the solution provided by Support to this thread as this uses a Tableau version 7 workbook, calculated fields and a couple of Table Calculations to arrive at the same outcome.

                     

                    We create a calculated field that set the value to 1 or -1 depending on the sum of Revenue for each product line. We then Window_Sum that value to arrive at the totals across Months. User ID is left on the Level of detail since this field is used for each table calculation function in the view.