6 Replies Latest reply on Jan 28, 2015 11:32 AM by Joe Mako

    Blended Boolean Column Totals are Not What They Seem

    Keith Helfrich

      So the story goes:


        - yours truly submits invoices, and every now & then he receives an e-mail to confirm some money is on the way


        - and every now & then, he checks his bank account to make sure the money was deposited there


        - to keep track of things, he maintains a simple excel workbook with two sheets inside: one for the invoices and one for the payments


        - these two worksheets share a common linking field


        - a handful of calculated fields are used:

         # days invoiced: how many days since the invoice was sent ?

         status: if the # days invoiced is greater than some threshold, it is overdue

         past due amount: if the status is "Past Due" then what amount remains to be paid


        - everything works great, except for the column grand total


      Why is that blended boolean column total equal to zero (instead of the total of the column) ?



        • 1. Re: Blended Boolean Column Totals are Not What They Seem

          Hi Keith,


          I don't know why and therefore hope a data blending expert makes this plain to us.

          That said – after playing a little bit with your formula (trial error method) – this seems to work:


          // If the status is "Past Due", then what amount remains to be paid ?
          SUM(IF [Status] = "Past Due" THEN ZN([Invoice Amount]) END) -
          ZN(SUM([Payment (Simple_Invoice_Payment_Tracker)].[Bank Credit Amount]))


          Null in line 3 for Status <> "Past Due" ensures that the calculated value returned is null too.

          This is not an issue in attached example, but would be, if Bank Credit Amount existed for Status = "Invoiced".


          Workbook is attached with this formula added..


          Jóhan aka Kettan


          thread 153731 - Simple_Invoice_Payment_Tracker - kettan.png

          • 2. Re: Blended Boolean Column Totals are Not What They Seem
            Keith Helfrich

            Hi Kettan,


            Thank you for taking the time to trial & error method my question.  I can see what you've done is working, which is a good sign!  But I don't really understand why.  And I'd love to know why the simple blended boolean doesn't do what we would expect for it to.


            Joe Mako (or Joe Mako why are there two of you? :) when you have time, maybe you could help us to understand ?  This is not at all urgent, just a curiosity.


            Thank you!

            Keith Helfrich | Twitter

            Red Headed Step Data

            • 3. Re: Blended Boolean Column Totals are Not What They Seem

              You are welcome, Keith.


              Ps.  Jonathan Drummey  is also an expert in Grand Totals and probably already explained this issue somewhere.  He also created an request for  Totals and subtotals over displayed marks instead of separate calc (aka make subtotals work like Excel)  which might make it less important to know why Grand Total returns zero for this formula:


              IF ATTR([Status]) = "Past Due"
                THEN ZN(SUM([Invoice Amount])) - ZN(SUM([Payment (Simple_Invoice_Payment_Tracker)].[Bank Credit Amount]))
                ELSE 0


              and correct Grand Total for this formula:


              SUM(IF [Status] = "Past Due" THEN ZN([Invoice Amount]) END) -
              ZN(SUM([Payment (Simple_Invoice_Payment_Tracker)].[Bank Credit Amount]))


              • 4. Re: Blended Boolean Column Totals are Not What They Seem
                Keith Helfrich

                Hi Kettan,


                I have spoken with Joe Mako, and he helped me to understand the reason why the automatic column total doesn't work across the data blend.


                To describe all of the details, I've written a new blog post:



                And to encourage the Tableau engineers to offer us a better solution, I have also created the new idea:



                In the attached workbook, please take a look at the new sheets I've created, which offer the details about the solution described in my blog.


                Also, just a note: Joe Mako additionally helped me to see that the calculation you have found via the trial & error method is a bit risky.  That's because it only works by chance with this data, and it can break in the future when the data changes.


                In the attached workbook, we have also inserted a new record of data.  This new record has a Bank Credit Amount of $1,000 for Invoice #5.  And that payment was for an invoice which is not past due.  In this case, the calculation you found via trial & error breaks because it incorrectly subtracts this $1,000 from the Past Due Amount column total.


                Please take a look at everything & let me know if you have any questions.



                Keith Helfrich | Twitter

                Red Headed Step Data

                • 5. Re: Blended Boolean Column Totals are Not What They Seem

                  Thank you, Keith, for going into depth with this and thus making this precious Tableau knowledge better known. And of course big thanks to  Joe  for sharing this knowledge with you.


                  I don't have much to add. I think your blog is well-written and easy to understand ... even if I am still struggling to comprehend its depths.


                  I just saw a similarity in  Jonathan's  solution in  Re: Calculate difference in value based on two date parameters   which shows you don't need to duplicate the Invoice Link # field, but instead can use ATTR(Invoice Link #) in Rows. I mention this because it might be an information you like to include in your blog article. I have checked that it also works with your workbook. See sheet ATTR(Invoice Link #) in attached workbook. [Added later:  MIN(Invoice Link #) and MAX(Invoice Link #) also works in Rows.]


                  Another minor discovery to me was that [Status] can be left out of the view without breaking the formula [Past Due Amount] => [Past Due Amount (w GT)]. I guess this is because we have full access to all fields including those not used in a view. Having [Invoice Link #] in Marks ensures it always returns a [Status] rather than an asterisk (*) and thus keeps the formula working.


                  IF ATTR([Status]) = "Past Due"
                     THEN ZN(SUM([Invoice Amount])) - ZN(SUM([Payment (Simple_Invoice_Payment_Tracker)].[Bank Credit Amount]))
                     ELSE 0
                  • 6. Re: Blended Boolean Column Totals are Not What They Seem
                    Joe Mako



                    Your sheet 'ATTR(Invoice Link #)' is close, but it is missing the partitioning. The table calculations that end in '(w GT)' are designed and configured to be partitioned by 'Invoice Link #' and addressing on 'Invoice Link # (copy)'. In your worksheet, notice the values for INV#4 and INV#5. One option to resolve this and retain the ATTR() aggregation is to right-clicking on the pill 'ATTR(Invoice Link #)' and uncheck Ignore in Table Calculations, this will cause it to partition the view.


                    What is happening in the 'finished' worksheet is the canvas and Grand Total areas have the same calculations, but with different effective compute usings because not all dimensions are available to the Grand Total area.


                    • The Canvas has the Dimensions:
                      • Status
                      • Invoice Link #
                      • Invoice Date
                      • Invoice Link # Copy
                    • The Grand Total has the Dimensions:
                      • Invoice Link # Copy


                    When the compute using of the table calculation pill is set to 'Invoice Link # Copy':


                    • In the Canvas, the other dimensions will partition, this results in each partition having a single mark, so the FIRST()==0 condition will keep just that single mark, and the WINDOW_SUM() has no impact on the single value because the sum of a single value is that single value.
                    • In the Grand Total, there is no partitioning, because there are no dimensions beyond what is being used for addressing.


                    The fields Status and Invoice Date are simply additional information, they do not increase the granularity because they are information about the 'Invoice Link #'. In other words, if 'Invoice Link #' is partitioning the view, then partitioning on those dimensions as well will effectively have no impact on the computation.


                    By having 'Invoice Link #', 'Status', and 'Invoice Date' as dimensions pills on the Rows shelf, we have additional capability such as turning on Subtotals. A common use case for ATTR() pills is when using multiple Discrete Dimension pills on the view, and and we do not want them impacting the Partitioning/Addressing of table calculations, it is a quick and easy way to Make a Discrete Dimension pill a Discrete Measure pill. Overall, in my experience, making pills ATTR() adds complexity and removes capability, so it a shortcut to be used sparingly. If the goal is performance in a situation where extra dimensions cause the query to be slower, and the other capabilities of dimensions are not desired, then a MIN() or MAX() would be more performant than an ATTR().


                    Overall, table calculation formulas are fragile, and require awareness of multiple factors that impact them, frequently it is not simply a drag and drop and done. These factors are in many cases not documented, one of my chief complaints about Tableau is their lack of documentation on the complexity of the invisible logic that is performed in complex situations when multiple features are being used at the same time.


                    This thread is a good example of the complexity when dealing with:

                    • ATTR()
                    • Multiple Dimension pills
                    • Marks Card
                    • Measure Names/Values
                    • Data Blending
                    • Grand Totals
                    • Table Calculations

                    at the same time. It is easy to have more features in use at the same in addition to these, such as data densification, to make the situation even more complex.


                    My general advice is to avoid ATTR() useless it is being intentionally used as a shortcut to make a Discrete Dimension a Discrete Measure with awareness of the impact it will have, but maybe there is something I am missing, some other factor that I am not thinking of here.


                    For what capability, reason, or goal do you want to make them ATTR() pills?