1 2 Previous Next 18 Replies Latest reply on Sep 26, 2014 12:16 PM by Matt Lutton

    Help on summing distinct values

    Von T

      Hi,

       

      I've the following excel data for testing:

       

      data1.jpg

       

      The requirements: I want to sum the distinct budget sales 1) by region 2) by category, but I got stuck. I tried using ATTR([Budget sales]) on by region, but then it gives an *. However, ATTR([Budget sales]) gives me the correct result by category.

       

      It's a simple data and simple requirements, but I just can't get the correct result. I think I've to use some functions, but not sure what functions to use and how. Does anyone know how to do it?

       

      Eventually, the sum of the (distinct) budget sales will be used in a division calculation, for example: sum([some var])/sum of distinct budget sales. But right now, I need to first get the distinct sum of the budget sales correct.

       

      I've attached the workbook. Thanks a lot!

        • 1. Re: Help on summing distinct values
          Matt Lutton

          I used a Table Calc to generate those results--your problem is you want to include Category in your calculation, but it is not in the view, so Tableau cannot reference it.

           

          My solution is to pull Category onto the Detail shelf, and use a Table Calc of:

          IF FIRST()==0 THEN WINDOW_SUM(attr([Budget sales])) end

           

          With a Compute Using (Addressing) on Category (Partitioning is on Region, by default, in this setup)

           

          TC.png

          The last step is to turn off Mark Stacking via the "Analysis>>Stack Marks" menu option.

           

          Cheers.

          1 of 1 people found this helpful
          • 2. Re: Help on summing distinct values
            Von T

            hi Matthew,

             

            Thanks for the formula. I don't understand the formula. What does "if FIRST()==0" in your calculation mean? Every row will not be true because FIRST is not equal to 0, right? Then if it's not true then where is the else part? I'm lost about the formula. Can you explain to me the formula "IF FIRST()==0 THEN WINDOW_SUM(attr([Budget sales])) end" ?

             

            Next, how about the division? i.e. If I use this formula sum([Sales])/[WS Budget Sales], then the region West returns blank (no result), but the region South returns correct result. Why?

             

            Attached is the workbook. Sheet 3 is the division calculation.

             

            Thanks again for the help.

            • 3. Re: Help on summing distinct values
              Matt Lutton

              Well, I will try my best in a quick recap.

               

              The IF FIRST()==0 is an optimization, so that only one value per partition is returned (one value per Region in this case)

               

              The WINDOW_SUM(attr(Budget Sales)) gets at what you really wanted--the row level SUM of the distinct values per region.  The Compute Using (addressing) is most important here--again, Category must be in the view for this to work (and so, if we add more complexity, such as dividing these values, we need Category and Region to be in the view).

               

              That's my quick recap.  I'll look at the workbook when I can.

               

              There may be several other approaches or calcs that return the same.  This is just how I approached your particular scenario (and I was not considering the division as I didn't pay attention to that in your original question; that's my bad).

              1 of 1 people found this helpful
              • 4. Re: Help on summing distinct values
                Matt Lutton

                What are your expected results in Sheet 3, real quick?  If you're looking for:

                 

                South: 83.33% and West: 41.67%

                 

                Then, you just need to include the Category dimension in the view (on the Detail shelf, again--this makes it available as a field for our Table Calc to use), and set the Compute Using on your new Calculation1 pill to be the same as before--Compute Using (addressing) on Category, and portioning by Region, as a result.  You'll need to turn off Mark Stacking again as well (you can see the impact of this if you turn it off through the "Analysis" menu):

                 

                Compute Using.png

                 

                For more details on learning about these terms and table calcs in general, I suggest Jonathan Drummey's blog as a starting point:

                Want to Learn Table Calculations? Here’s How! | Drawing with Numbers

                1 of 1 people found this helpful
                • 5. Re: Help on summing distinct values
                  Von T

                  Hi Matthew,

                   

                  Thank you for the explanation. Now, I understand the formula (plus looking up the meaning of FIRST()).

                   

                  So for the division, why the region west returns 41.67%

                   

                  Shouldn't it be 125%? For region west, the sum of the sales is 75,000, the budget sales is 60,000. This 60,000 is the result of using the "WS Budget Sales" calculation which gives us the distinct budget sales. Why it doesn't plug in the value 60,000 for "WS Budget Sales" in the formula sum[(Sales)]/[WS Budget Sales]?

                   

                  Thanks again for the help.

                  • 6. Re: Help on summing distinct values
                    Matt Lutton

                    It returns 41.67 for that particular setting; I was not sure what you were looking for exactly.  I'll take another look at getting your expected results--what about for South?  83.33?

                     

                    I forgot to add the IF FIRST()==0 THEN... to the percentage calc as well.

                     

                    Let me attach the "solution" I've got for your first, then I want to ping in some experts as I've discovered a curious setting here...

                     

                    Cheers

                    Untitled.png

                    • 7. Re: Help on summing distinct values
                      Von T

                      Hi Matthew,

                       

                      Thank you so much for the help. I see that the "Compute using" is set to Region to get the correct result.

                       


                      • 8. Re: Help on summing distinct values
                        Matt Lutton

                        Actually, I was seeing different results with different Compute Usings, and feel like I discovered some sort of very odd bug... pinging a Table Calc guru to get his thoughts, if he's available.

                         

                        Jonathan Drummey  In the above example, on the Percentage pill, if I set the Compute Using directly on the pill to "Category" it gives me the 41.67% as the result...

                        Simple Compute Using on Category Thread 299321.png

                        however, went I went into the "Edit Table Calculation" >> and set it to "Category", it gives me the 125%...

                        Advanced Compute Using Thread 299321.png


                        This feels like a bug to me; I do think "Region" is the correct compute using (addressing dimension) in this example, but Tableau is giving me different results based on whether I set the Compute Using on the pill itself, or in the "Edit Table Calculation' dialog.

                         

                        8.2 workbook is attached -- So, when you guys open this workbook do you see the Compute Using as being on "Region" or "Category"?  I think Tableau is just displaying something different from what it is actually doing, but I'm not quite sure.

                        • 9. Re: Help on summing distinct values
                          Von T

                          Hi Matthew,

                           

                          I just found out that by setting Stack marks to off (your step), the values got messed up if I changed the data which returns two different percentages for region West.

                           

                          It so happened that my original data returns two same values of 125% for region West. So when set the Stack mark to off, the percentages merged nicely. But when I changed the data for region West, it now has two different percentages, 83.33% and 125.0%. So when I set the Stack marks to off, the percentages got messed up. So the Stack mark doesn't work here. and I wonder how to get the correct result, which should be 107.14% for region West?

                           

                          I've attached the workbook with revised data.

                           

                          Again, thank you for taking time working over this problem I'm facing.

                          • 10. Re: Help on summing distinct values
                            Matt Lutton

                            First, you didn't have IF  FIRST()==0 THEN... which will effectively limit your results to one value based on your Table Calc settings.

                             

                            I do not understand how you're expecting a result of 107.14% for the West region -- can you explain your math, and provide expected results for both Regions in sheet 3?

                             

                            Also, what do you mean by "Changing the data"?  Are you talking about adding a Filter to the view?  If so, it is always helpful to know these requirements from the start, as they can impact the final solution.  Table Calcs happen after regular filters, so that may explain some of your issue; I just don't fully understand the problem you are trying to point out here.

                             

                            If you can provide a mock-up of your expected FINAL view, including filters, etc. that would be really helpful.

                             

                            Cheers!

                            • 11. Re: Help on summing distinct values
                              Von T

                              Hi Matthew,

                               

                              "changing the data" - I mean if I changed the data so that now it has two different percentages for region West (for Furniture and Technology category). It happened that in my original data, it returns the same percentage of 125% for region West for Furniture and Technology. So when you set Stack marks to off, the percentages merged.

                               

                              With the revised data, the "WS Budget Sales" calculated field returns distinct Budget Sales. So, for region West, "WS Budget Sales" returns 70,000 (30,000 + 40,000).

                               

                              sum([Sales]) would be the sum of sales per region. So for region West, the sum of sales would be 75,000.

                               

                              So for region West, the distinct Budget Sales would be 70,000 (Furniture West 30,000 + Technology West 40,000) and the sum of sales would be 75,000. Therefore, the percentage for region West would be: 75,000/70,000 = 107.14%.

                               

                              The result for region South is correct (83.33%).

                              I initially did add the IF FIRST==0 in the Percentage calculated field, but it doesn't have any effect, with or without it. I've added the statement back.

                               

                              I've attached the revised workbook with revised data.

                               

                              Thank you for taking time helping me on this.

                              • 12. Re: Help on summing distinct values
                                Matt Lutton

                                Don't have much more time tonight, but here are the initial results you were looking for.  The confusion for me was that you really wanted a Total Sum at two different levels -- I've used both two WINDOW_SUMs with variations, if you look closely.  I can explain more later, if needed.

                                 

                                I really hope you don't want to add quick filters from here...   that makes it more complex in Tableau for sure, as we're dealing with nested Table Calculations here.

                                 

                                A couple of notes:

                                1) the IF FIRST()==0 in each calc could be replaced with a simple FIRST()==0 filter with the same compute using settings; I chose to do them in each calc as that is what I am used to doing.

                                2) It is certainly possible to add filters from here, but if you want to retain your percentage results it becomes more difficult.  If you Google "maintain percent of total tableau" you will find some good results on ideas.

                                 

                                There are other options to solving your initial problem, including but probably not limited to: pre-aggregating your totals in your result set via Custom SQL or in your data source itself, there is likely a solution using data blending, etc.  It's all about the data you've got to work with, and what path you choose to get the results you want.  As we add complexity to the problem, one path may become overly difficult or downright impossible. So, we have to understand the overall goal from the beginning and then we can try to connect the dots in our understanding of Tableau to try and reach a solution.

                                • 13. Re: Help on summing distinct values
                                  Von T

                                  Hi Matthew,

                                   

                                  I appreciate you taking the effort to help me solve my problem. Your new calculated fields work!

                                   

                                  I understand the IF FIRST()==0 statement applies to attr([Budget Sales]) in the "WS Budget Sales" calculated field:

                                  IF FIRST()==0 THEN WINDOW_SUM(attr([Budget sales])) END  (1)

                                  Since Budget Sales column contains several same values, so the IF FIRST()==0 statement makes sense and will return only one value (the first value).

                                   

                                  But I don't understand why the IF FIRST()==0 statement also applies to SUM([Sales]) in your new calculated field "WS Sum of Sales": IF FIRST()==0 THEN WINDOW_SUM(SUM([Sales])) END  (2)

                                  The Sales column does not have duplicated values (it has different values). So what does statement (2) do? What is the difference in operation between statement (2) and statement (1)? If you or anyone in this forum got the time to explain that to me, then it'd be greatly appreciated.

                                   

                                  Before posting my problem, I couldn't find a way of pulling the sum of the distinct values and then applying to division, so I connected to a second data source containing distinct values and then joined the tables. It worked with no problem, using a second data source. But I really don't prefer using another data source just to solve the distinct values problem. For large data sets, it's not efficient (I think); it's ok (I guess) for small data sets. My original project data is much bigger and has several filtering on the dashboard for user interaction. I'll follow your methods on my original project and see if there's any problem.

                                   

                                  Thank you again for all the help.

                                  • 14. Re: Help on summing distinct values
                                    Matt Lutton

                                    I'll do my best to explain in brief--but you can always change my calcs to see the impact it has on the final view:

                                     

                                    If we remove IF FIRST()==0 from the WINDOW_SUM(SUM(Sales)) calc, we will get marks stacked on top of each other with our current settings.  I'm sure there's a better explanation than that if I dig into the data, but the basic concept here is to ensure only one value is returned for each partition in the view (set by our Compute Using settings in the Table Calc)

                                     

                                    I think there may be a misconception here about what IF FIRST()==0... is doing.  The result of what it does is based on our compute using settings, so it has more to do with how our view is laid out, and since we are setting the Compute Using on "Category", the portioning is on "Region".  Both dimensions impact the final calculation, and I think that may be the main factor you're missing -- so this optimization ensures only one value is returned for each Region.  You can see the impact if you remove IF FIRST()==0 from any/each calc.

                                     

                                    This is an older article about using this optimization, but note that the calcs shown in this article are no longer necessary, in their entirety.  What I mean is that in the article you will see calcs like:

                                     

                                    IF (FIRST()==0) THEN    

                                    WINDOW_SUM(SUM([Population]), 0, IIF(FIRST()==0, LAST(), 0))    

                                    END   

                                     

                                    In current Tableau Versions (8+, I believe), those last arguments to WINDOW_SUM are no longer needed to optimize the calc, and we can simplify to:

                                     

                                    IF (FIRST()==0) THEN WINDOW_SUM(SUM([Population])) END

                                     

                                    As I said, there may be other/better/quicker ways to get at these same results -- I used Table Calcs, because that is what I tend to use in situations like this.  But they ARE tricky.  I would encourage you to read as much as you can on the topic, re-create workbooks using Table Calcs, etc. and eventually, these concepts start to make more sense.

                                     

                                    Also, note that INDEX()==1 or LAST()==0 would return the same results in these calcs.  And again, we could remove the IF FIRST()==0 from each of the calcs, and instead build a filter with the same compute using settings--that filter would be simply: FIRST()==0, set to True.

                                    1 2 Previous Next