12 Replies Latest reply on Sep 14, 2011 2:51 PM by Jeremy Ficken

    Sum data based on maximum date

    Jeremy Ficken

      I want to have a calculation that will evaluate whatever the most current date is and return the result given the most date. For example, say today is 9-12, and there's only data loaded up until 9-10.  In this example, the calc should return data from 9-10. On the other hand, if today is 9-12, and data is loaded up until 9-11, then the calc should return data from 9-11.

       

      Right now I'm doing something similar where there's an IF statement using some "Today minus 1" logic:

      SUM(IF TODAY()-1=[date] THEN [metric] END) / TOTAL(SUM(IF TODAY()-1=[date] THEN [metric] END))

       

      I'm finding this doesn't produce the best results because sometimes users might view the dashboard before ETL completes (which will load data up to yesterday), so certain fields would be null until the data is refreshed (see attached png). Users would prefer to see the most current data instead of blanks.

       

      If I were using a live connection, and not a refreshed extract, then I'd probably do something more like what's described here (http://www.tableausoftware.com/support/knowledge-base/five-recent-days) with the RAWSQL_DATE example where it's selecting the MAX date.

       

      Any suggestions?

        • 1. Re: Sum data based on maximum date
          Richard Leeke

          You can use data blending to a second connection to the same extract to get around the lack of RAWSQL with an extract.

           

          I've attached a Coffee Sales example.  Tricks to note:

           

          1) I've added a calculated field called "Join Key" with a constant value of "X" for all rows.  I've then changed the relationship definition so it only joins on that key.  This allows each row in the primary data source to join to the MAX() over all rows in the secondary.

           

          2) I've created a calculated field "Is Max Date?" which checks each row in the primary against the overall maximum date from the secondary.  This field has to be expressed as a continuous numeric field in order to be able to use it on a filter shelf (I'm not sure why - it's just the only way I can get it to work).

          • 2. Re: Sum data based on maximum date
            Jeremy Ficken

            Thanks for posting Richard. I'll take a look.

            • 3. Re: Sum data based on maximum date
              Joe Mako

              Another option is a custom table calc like

               

               

              MAX([Date])=TOTAL(MAX([Date]))


               

              With the compute using set to all dimensions in use, so there is no partitioning, as in the attached.

               

              I do not know what method would be faster.

              • 4. Re: Sum data based on maximum date
                Richard Leeke

                I just compared Joe's way with mine on a 23 million row extract.  Joe's way was definitely quicker for the particular thing I did - though both were very snappy (say 2 or 3 seconds for Joe's, 5 or 6 seconds for mine).

                 

                I was picking out the last day out of about 20, and aggregating broken down into about 20 marks.  The behaviour might be different with lots more detail - though thinking about what it has to do I think Joe's is likely always to be best.

                • 5. Re: Sum data based on maximum date
                  Joe Mako

                  Thank you for looking into that Richard.

                   

                  This may even be something you would want to add into your ETL process, add an additional field for Max Date so your can filter before aggregation.

                  • 6. Re: Sum data based on maximum date
                    Jeremy Ficken

                    Thank you both for posting. I'm concerned that this might not work in my case, as I'm already using a Date relative filter for some other measures that are being reported in the same grid. Ideally, this would all need to be handled in a custom table calc without involving an additional filter or limiting to level of detail.

                     

                    The suggestion to have a Max Date field in the underlying summary data might be a more appropriate option, assuming that the calc can be built in such a way that evaluates for Max Date and returns the value of the measure. Not sure of the syntax, or if this is possible. Thinking:

                     

                    sum(IF [Max_Date] = '1' THEN [metric] ELSE '-' END)

                    • 7. Re: Sum data based on maximum date
                      Joe Mako

                      How about just

                       

                       

                      IF [Max_Date] = '1' THEN [metric] END


                      • 8. Re: Sum data based on maximum date
                        Jeremy Ficken

                        Yes, that might work as well. Going to explore option to see if Max Date can be added to the summary data.

                        • 9. Re: Sum data based on maximum date
                          Jeremy Ficken

                          After looking at that calc (IF [Max_Date] = '1' THEN [metric] END) one would think that this should be functionality that the application would support without needing to make modifications to the underlying data.

                           

                          If this data is being held in an extract, then it could support derived/on-the-fly calculations that are more SQL in nature. A measure (via calc field) should be able to be defined as:

                           

                          select

                          sum(metric) as METRIC_NAME

                          FROM fact_table

                          where date in (select max(date) from fact_table)

                           

                          Tableau should have a semantic layer or abstraction logic that converts SQL to whatever its underlying engine uses. Users are already using custom SQL to extract data in - should support in-app SQL functionality. It would be nice if Calculated Fields supported SQL in extracts and not just in live connections.

                          • 10. Re: Sum data based on maximum date
                            Jeremy Ficken

                            Changed the data that's coming into the extract, so there's a Max Date date flag and built calc field like this:

                             

                            SUM(IF [Max Date] = 1 THEN [metric] END) / TOTAL(SUM(IF [Max Date] = 1 THEN [metric] END))

                             

                             

                            The previous calc I was using, which wasn't addressing the "maximum date" issue is:

                             

                            SUM(IF TODAY()-1=[date] THEN [metric] END) / TOTAL(SUM(IF TODAY()-1=[date] THEN [metric] END))

                             

                            The former (and newer) calc produces different results, which is confusing given that they both operate on the same underlying data. I did have single quotes on the Max Date = 1 value previously, and it was throwing a data type error. I removed the single quotes, and now it works, but with results that aren't accurate.

                             

                            Any suggestions?

                            • 11. Re: Sum data based on maximum date
                              Joe Mako

                              I would recommend some profiling of your data, create a view like this:

                               

                              1. [date] as a Discrete All Values (blue pill with no aggregation or trunc) on the Rows shelf

                              2. [Max Date] as a Discrete Dimension (blue pill with no aggregation) on the Rows shelf

                              3. Number of Records on the Text/Label shelf

                               

                              Are there any records where the yesterday's date is not flagged as the max date?

                               

                              Are there any records flagged as max date that are not yesterday?

                               

                              If that does not help, you will need to provide a sample packaged workbook that represents your situation.

                              • 12. Re: Sum data based on maximum date
                                Jeremy Ficken

                                Are there any records where the yesterday's date is not flagged as the max date?

                                No

                                 

                                Are there any records flagged as max date that are not yesterday?

                                No

                                 

                                Working on the sample package.

                                 

                                ****update****

                                 

                                I take it back - it works. User error on my part. Thanks for all the suggestions!

                                 

                                To provide the abridged version of the solution:

                                I modified the custom SQL query I was using to include some logic that did a self-join on the fact table and looked at the past 4 days of data loaded via ETL and selected the MAX date, then use a CASE statement to assign a "max date flag" based on a comparison of the MAX date and the dates in the fact table.

                                 

                                The Tableau calculated fields then follow this logic to retrieve the most current date:

                                SUM(IF [Max Date] = 1 THEN [metric] END) / TOTAL(SUM(IF [Max Date] = 1 THEN [metric] END))

                                 

                                This way, the reports will always represent the current date of what's been loaded into the fact table, and avoids the issue of the extract not having yesterday's data prior to being refreshed.