1 2 Previous Next 17 Replies Latest reply on Apr 30, 2013 6:34 AM by Bruna Oliveira

    SUMIF, conditional sum, mixing aggregate and non-aggregate

    Justin Larson

      I'm embarrassed to say I've been using Tableau for nearly a year and still can't figure out how to do this:

       

      How can I get a calculation to work like sumif does in Excel? Once I have this figured out, it should work for any conditional calculation (maxif, minif, countif, averageif, etc)

       

      The classic example I see is people structuring a calculation like this:

       

      AVG(IF [Category]='New' THEN [Units] END)

      or

      AVG(IIF([Category]='New',[Units],0))

       

      Where handling the return of Nulls is the difference between the two.

       

      Ok. That works when the only "if" is looking at Category="NEW". Yes, you could just add conditions with a string of elseifs, but what if there are a thousand different values in "Category"? What if Category is actually "CustomerID" and not only are there lots of values, but they will change with each refresh? In excel, this is equivalent of using:

       

      =sumif([Category],"NEW",[Units])

      or worse

      =sumif([CustomerID]="123456",[Units])

       

      What I'm after is sumif like any Excel user does it, where the calculation would look like this:

       

      =sumif([Category],[@[Category]],[Units])

       

      Where [Category] is the whole field, and [@[Category]] is the value in that field in this record.

       

      This is, essentially what Tableau considers mixing aggregate and non-aggregate conditions.

       

      The nearest I could come up with in Tableau is:

       

      sum( if [Category]=[Category] then [Units])

      maybe

      sum( if ATTR([Category])=[Category] then [Units])          ?

       

      which of course doesn't do anything. It's the same as sum([Category]) and relies on what is in the table view because [Category] always equals [Category].

       

      To phrase this another way, I want to partition the dataset to a subset using a field (like [Category] or [CustomerID]) and do the calculation just on that subset, but pass through a value from the row as a variable, rather than hardcoding it as a string argument. To be clear, I understand that this is a super-simple calculation if the [Category] or [CustomerID] is in the view, but I need the calculation to work the same whether or not that field is in view.

       

      Use case:

       

      I have a transactional database. There are four fields: CustomerID, Category, PricePaid, PurchaseDate

       

      I need a calculation that shows the average customer lifetime (Max(purchasedate)-Min(purchaseDate) {calculated by customer using CustomerID}.

       

      I need a calculation showing the lifetime value of a customer (sumif([CustomerID],[@[CustomerID]],[PricePaid])

       

      I need to calculate the average number of days that has passed since each customer's most recent purchase. (today()-max(PurchaseDate))

       

      Lot's of other uses, you get the idea.

       

      Anyone know what I'm getting at?

       


        • 1. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
          Joshua Milligan

          Justin,

           

          I think I understand what you are getting at, but it would be very helpful to have a sample workbook with some sample data to work out one of the use cases you explained.

           

          Joshua

          • 2. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
            Shawn Wallwork

            Justin I think you just need to change the way you are thinking. First being "in the view" doesn't mean a field has to be on the column or row or label shelves. It can be on the level of detail shelf and be "in the view", and most of the time has little impact on the viz (most, not all). For instance, you say:

             

            I need a calculation that shows the average customer lifetime (Max(purchasedate)-Min(purchaseDate) {calculated by customer using CustomerID}.

            So you can use something like this:

             

            WINDOW_AVG(DATEDIFF('day', MIN([purchaseDate]), MAX([purchaseDate]) ))

             

            Then put the CustomerID field on the LOD shelf and set the compute using to CustomerID. This will give you the overall average customer lifetime days.

             

            You say:

            I need to calculate the average number of days that has passed since each customer's most recent purchase. (today()-max(PurchaseDate))

            Try:

             

            WINDOW_AVG(TODAY() - MAX([PurchaseDate]))

             

            And set compute using to CustomerID. This should give you the figure you're looking for.

             

            As far as your @[CustomerID] goes, you're right Tableau doesn't support variable substitution. But there are ways to overcome this limitation. Like Joshua says, post some data, or a packaged workbook so we can help guide you in the right direction.

             

            --Shawn

             

            Message was edited by: Shawn Wallwork

            • 3. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
              Justin Larson

              Don't take my introduction as an assault on the community or on Tableau. I have not asked any questions via this forum since, for the most part, I have been able to figure out what I need to do in Tableau or using workarounds prior to data import. This one is just so simple in Excel, and seems so complicated in Tableau - it came up the first day I used Tableau and I'm realizing I still don't have an adequate understanding of more technical calculations to handle it. Furthermore, I understand that my question may be ambiguous since I have a hundred uses in mind, and not just a single vizualization. I'll try to be specific enough to allow the community to understand what I'm hung up on.

               

              So, Shawn, using the LOD shelf bring it's own limitations. Let's say I want to do a line chart showing the average lifetime value of customers in different regions over time. If I drop the CustomerID in the LOD shelf I get as many lines as I have customers, even if I only have a few regions to compare. I need the CustomerID to perform the calculation, but I need to not have it in the viz.

               

              Similar situation when you start trying to color bars - I may want to shade the bars by categorical data, but when the CustomerID is in LOD, each bar gets chopped up into thousands of tiny slices.

               

              This problem gets even more complicated when I need to mix "sumif"-like calculations. For example, I want to look at the average number of distinct customers appear per retail location by region. This calculation tries to aggregate a calculation that relies on mixing aggregated and non-aggregated conditions (breaking every rule tableau has about calculations) and would have to consider both retailID and customerID to work out correctly, but neither of those ID's need to be in a view and tableau does like you trying to aggregate aggregations.

               

              that one would look something like:

              countd([customerID] if [retailerID]=[@[retailerID]])

               

              So, it's hard to show an example workbook, since I'm not very close to any of the things I'm talking about. However, I've drawn up a sample of a transactional dataset with a few use cases described in annotations. My solution in the past has simply been to do the calculations outside of tableau and bring them in as pre-calculated fields.

               

              If the solutions to each of the use cases is different, it will still give me a clearer understanding of how to formulate workbooks, so I appreciate any time taken to show solutions to these.

              • 4. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                Shawn Wallwork

                Yeah, Justin I misread your first line and that's why I posted that PS which I have since obviously deleted. I thought you'd opened a support case a year ago. My bad, my apologies. When I get a chance I'll take a look at your workbook, and try to come up with a better solution. Meantime maybe Alex or Jonathan or Joe have some ideas. And I'm sure Joshua will probably jump back in.

                 

                --Shawn

                • 5. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                  Jonathan Drummey

                  Hi Justin,

                   

                  In Tableau, when you want to aggregate over aggregations, there are two general methods. One is to go back to SQL to do it yourself, the other is to use Tableau's table calculations, which can be nested as many times as you need, each aggregating with its own separate addresssing and partitioning. Personally, I find table calcs a lot easier than mucking about in the SQL.

                   

                  I set up examples of each of your worksheets using table calculations. The calculations each have their compute using set to the Customer ID so they will partition on State and Product as necessary. You'll see they look very much like what Shawn wrote, only the optimized versions of the calculated fields have an extra bit at the beginning of IF FIRST()==0 to reduce the number of marks returned to a single mark for drawing (whether on a table, line, bar, etc.). I also added filters on the views to get rid of the Null marks that are returned, this can speed up the view.

                   

                  For more information about using table calculations, I recommend reading the series of links referenced here: http://drawingwithnumbers.artisart.org/want-to-learn-table-calculations/

                   

                  Jonathan

                  2 of 2 people found this helpful
                  • 6. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                    Justin Larson

                    Well, the formula results as written are not quite what I had in mind, BUT

                     

                    "use Tableau's table calculations, which can be nested as many times as you need, each aggregating with its own separate addresssing and partitioning"

                     

                    This is the nugget I needed to get moving forward. Every time I tried re-aggregating, I was not using table calculation formulas, which resulted in Tableau's error of something-er-nother "already aggregated."

                     

                    The SQL approach is what I had been doing previously, and was referring to as a workaround, as it's not really the software providing a solution as much as me smashing my data with a bigger hammer.

                     

                    Your iterative crosstab, showing the results in table format helped illuminate a lot as well, though I still have a hard time with the "advanced" dialog box and have yet to find a reasonable tutorial describing how to understand it. It's possible the answer is in the list of links you sent, so thanks for that. Maybe when I'm a master-Jedi, I will create a workbook designed to be a "learn how all this stuff works" for the community.

                     

                    I think I'll be able to get what I need from the trajectory set here.

                     

                    Thanks for the time, all.

                    • 7. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                      Shawn Wallwork

                      Justin, I started this thread partly because of your current need (frustration). I think it's worth discussing.

                       

                      --Shawn

                      • 8. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                        Jonathan Drummey

                        Hi Justin,

                         

                        I'm glad you found the workbook useful. The crosstabs views are pretty much how I work with table calculations, I use the crosstabs to have numbers to check and validate, then once I have those correct I'll duplicate the worksheet and create the final view.

                         

                        If you'd like, I'd be happy to set up a webex and walk you through table calculations as I understand them. I wouldn't be where I'm at with table calcs without the help of others, particularly Joe Mako, Richard Leeke, Alex Kerin, Ross Bunker, and James Baker. Send me an email at jonathan (dot) drummey (at) gmail if you'd like to.

                         

                        As for creating a "how this stuff works" workboo, please do! In the absence of really useful in-depth documentation on this subject from Tableau, we (the user community) are slowly creating it ourselves, and given the complexity of table calcs and that there are so many different ways people can learn I'm interested in having more voices in the conversation.

                         

                        Jonathan

                        • 9. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                          Justin Larson

                          Just discovered this downloadable workbook, which is pretty much exactly what I had envisioned for walking newbies through table calcs.

                           

                          http://www.theinformationlab.co.uk/2011/12/07/making-table-calculations-work-in-tableau/

                           

                          Interestingly, it's over a year old, and when I used google to find this link anywhere in tableausoftware.com, it only came up once.

                           

                          Something similar could certainly be done for more in-depth nuances, such as filter orders and padding, but this is a fantastic intro.

                          • 10. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                            Sean Mullane

                            Hi Jonathan,

                             

                            I like this method of creating table calculations. It's very... methodical. I think I'm going to have to start doing it this way.

                             

                            One thing I've been wondering is on this calculation -

                            "IF FIRST()==0

                               THEN WINDOW_AVG([CustomerLifetimeValue], 0, IIF(FIRST()==0,LAST(),0))

                            END"

                            - why is the second IF statement necessary? Is there some nuance to this that I'm missing? On the face of it it seems like the first statement renders the second redundant, but I've been using Tableau long enough now to realize there might be some wizardry going on. Can you shed any light?

                            • 11. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                              Shawn Wallwork

                              Sean, Richard did most of the pioneering work on this construction, which is all about performance optimization on large data sets (you generally don't need it on the smaller stuff). Here's how he explained it in guest blog he did over at Clearly and Simply.

                               

                              Richard writes:

                               

                              The basic window calculation that I need is this:


                              WINDOW_SUM(SUM([Population]))


                              But that will be evaluated 8,000 times, including 8,000 expensive sorts of the whole partition.


                              At first glance, it appears that all that is needed to fix it is this:


                              IF (FIRST()==0) THEN

                              WINDOW_SUM(SUM([Population]))

                              END


                              That just returns the result for the first row in the partition and NULL for all other rows, but unfortunately it turns out to take almost as long as the original. It took me quite a while to guess my way past this one, but eventually I worked out what must be happening. Tableau actually evaluates the expression in a couple of passes. First it gets the window ready. In this case, because I haven’t specified the optional parameters defining the range of the window, it defaults to the whole partition – which it duly sorts. Then it gets on with actually evaluating the expression, which means that for the first row it calculates the result and for all other rows it just ignores the nicely sorted rows.


                              Having guessed that that might be what it was doing, I tried another tack. For the first row, where we want the result, specify the entire partition. For all other rows, make the window just the single row being processed, like this:


                              IF (FIRST()==0) THEN

                              WINDOW_SUM(SUM([Population]), 0, IIF(FIRST()==0, LAST(), 0))

                              END


                              Sure enough, that fixed it. My three and a half hour case came down to about five seconds, and the calculations in the “Covered (blended)” sheet are now too fast to worry about.

                               

                               

                              EDIT BY SHAWN: So essentially that last IIF statement says after you've processes this query for this 1 row skip to the end, don't process all the other rows.

                              • 12. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                                Richard Leeke

                                Version 8 will make that construct unnecessary - there has been quite a bit of optimization work in version 8 and the problem that this was addressing no longer exists.

                                 

                                You will often still want the outer IF statement to suppress multiple rows, but the inner one isn't needed any more and in fact makes it a little bit slower. Version 8 makes window calculations on large partitions much faster than they were, so the extra gain from getting rid of that construct from old workbooks probably isn't worth the effort, but definitely leave it out of anything new you are working on - it makes it so much easier to see what is happening.

                                • 13. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                                  Sean Mullane

                                  Thanks Shawn and Richard, that makes perfect sense. Glad to see it's unnecessary for v8 though.

                                  • 14. Re: SUMIF, conditional sum, mixing aggregate and non-aggregate
                                    Noel Avison

                                    Would it be possible to plug that field into a new data source that shares the same categories? I want to do something similar where in one data set I have multiple entries that needed to be summed by category and another data source where I have the same categories and run a calculation based on the sums generated from the first data source.

                                     

                                    Basically in the first data source, there are many small units within each category and you need a sum of a specific field by category. The second data source is made up of nothing but the categories. In excel a sumif would be fine, but there that doesn't work. The other problem is that the sum is generated in part by which categories the user selects, so its not a field I can just calculate outside of tableau(without creating hundreds of columns).

                                    1 2 Previous Next