1 2 Previous Next 18 Replies Latest reply on Aug 25, 2016 1:30 AM by kettan

    How do you select newest occurence in a history tabel

    Kent Trustrup

      Hi,

       

      I have a table, I need to join with other tables. This table is updated everytime a new instance happens to an ID in the table, and adds a new line, with a newer timestamp.

       

      In SQL it's reasonably easy to choose the newest occurence, but I can't see how to do it in Tableau.

       

      So to clarify I have a table that looks something like this, note the timestamp is down to the 5 digit on the second, so no single ID will ever have more than one newest occurence.

       

      Unique ID
      TimestampStatus
      12016-01-01 A
      12016-01-02B
      22016-01-03A
      12016-01-04C
      32016-01-05A
      32016-01-06B
      22016-01-07B
      42016-01-08A
      52016-01-09A
      12016-01-10A

       

      The table should join other tables on the ID.

       

      How do I select so I end up with before joining with the other tables?

      Unique IDTimestampStatus
      12016-01-10A
      22016-01-07B
      32016-01-06B
      42016-01-08A
      52016-01-09A

       

      Or am I going about this the wrong way?

        • 1. Re: How do you select newest occurence in a history tabel
          Andrew Watson

          Best way is to do it in SQL before pulling into Tableau, therefore only pulling the latest data in. As you said this is simple in SQL.

           

          If that's not an option it's possible to do in Tableau, although will be slower than doing in SQL, we can discuss how to do in Tableau if you don't have any option to write SQL against your database.

          1 of 2 people found this helpful
          • 2. Re: How do you select newest occurence in a history tabel
            chris.moore.11

            As mentioned already, doing it before hand might be best.

             

            If that's not an option the workaround should be pretty simple. Make a field flagging the minimum date for an ID and then filter for that flag.

             

            See the photo. The calculated field "First Occ" is 1 for the first occurrence for that ID and null for all others.

             

            You would just have to then set a filter for "First Occ" = 1. You even should be able to set that filter as a data source filter and the other values wont even make it to your worksheet.

             

            Capture2.PNG

            2 of 2 people found this helpful
            • 3. Re: How do you select newest occurence in a history tabel
              kettan

              As already shared by Chris, you could use a Level of Detail Expression:

               

              Timestamp MAX

              { FIXED [Unique ID] : MAX([Timestamp]) }
              

               

              Filter MAX

              [Timestamp] = [Timestamp MAX]
              

               

              The two formulas above could be written as one:

              Timestamp] = { FIXED [Unique ID] : MAX([Timestamp]) }
              

               

               

              Ps. In other related situations, where Level of Details Expressions can't be used or don't perform as well, it would be nice if Tableau had built-in support of:

               

                2   SQL Window Functions

              18   Correlated Subquery

               

              Attached Workbook Version:  9.0

              .

              2 of 2 people found this helpful
              • 4. Re: How do you select newest occurence in a history tabel
                chris.moore.11

                Oh yeah, you would want to use maximum instead of minimum for most recent occurrence.

                2 of 2 people found this helpful
                • 5. Re: How do you select newest occurence in a history tabel
                  kettan

                  The table should join other tables on the ID.

                   

                  How do I select so I end up with before joining with the other tables?

                  I doubt Tableau has any efficient way to do this. I haven't used Tableau since 8.2 and only have an expired 9.0 Personal (can't connect to databases) license installed and therefore not in an environment where I can test what is and isn't possible.

                   

                  Options I can think of:

                  • Tableau
                    1. Custom SQL (probably inefficient for live connections and okay for extracts)
                    2. Data blending (with its limitations)
                    3. Level of Detail calculation (as already suggested)
                  • Database
                    1. Create materialized view (and thus also technical debt in the database)
                    2. Create view of your query (and increase the technical debt with each)
                    3. Create table that only has newest row per ID (dimensional modeling in data warehouse)

                   

                  In case you create objects (views, tables, etc) in the database, you could create a schema named tableau under which you put them. The benefits are 1) a less cluttered database, 2) an easy way to spot technical debt and its cause, and 3) a way to discourage use of these for other purposes than Tableau and thus reduce its dependencies (technical debt) to Tableau. When (or if) Tableau is discarded, these can be deleted without too much detective work finding out what it might break.

                   

                  How big is this table? And how big are the tables that join it?

                   

                  In SQL it's reasonably easy to choose the newest occurence, but I can't see how to do it in Tableau.

                  How would you do this in SQL? And what database do you query? (MS SQL, Oracle, PostgreSQL, etc)

                   

                  The reason I ask is to better understand your scenario and how Tableau could be improved in this regard.

                  • 6. Re: How do you select newest occurence in a history tabel
                    Kent Trustrup

                    Unfortunately I work at a very large company, and have somewhat limited options, so can't make a new table to use instead, would be my first choice.

                     

                    So I need to either write the SQL in Tableau, or make it as a calculation.

                    • 7. Re: How do you select newest occurence in a history tabel
                      Kent Trustrup

                      This one is interesting, I will try this, hope I can get it working

                      • 8. Re: How do you select newest occurence in a history tabel
                        Kent Trustrup

                        How big is the table, and the tables you need to join?

                        All tables have at least 300.000 rows, and in one case around 330 million rows, so it isn't small tables.

                         

                        The table I would start with is the 300.000 one, and it would be right joined (yes right) to 2 log tables I need to union that are 1,8 mil and 25 mil, both growing, and the joined to a couple of other tables, and then to a table with roughly 330 million rows. The input from this table would be limited, since it's filtered from the other joins, but it's still a rather large table.

                         

                        How would you do this in SQL? And what database do you use?

                        We use IBM DB2, and would make a line like this, where Time is the coloumn name of the timestamp:

                         

                        From AA.TABLE1 a

                         

                        Where a.time = (select max(time) from aa.table1 where a.uniqueid = aa.table1.uniqueid.time)

                         

                        This pulls out the newest occurence of every uniqueid in the table.

                        1 of 1 people found this helpful
                        • 9. Re: How do you select newest occurence in a history tabel
                          kettan

                          I am convinced that LoD expressions are not an efficient solution for your scenario, because I believe they are executed too late. This said, I am not speaking out of real knowledge about how and when LoD calculations are executed.

                           

                          If the query is saved as a server extract, I would consider using custom SQL.

                           

                          Is this going to be a live connection?

                           

                          As for a live connection, I have no idea how or if this can be done efficiently in Tableau 

                           

                          According to this blog article, I would come to the conclusion that there are no efficient ways to do this in Tableau.

                           

                          The rule that we’ll cover today is:

                          "Limit custom SQL in live connections as they can be inefficient. Where possible, create a view on the database server to implement your custom SQL and connect Tableau to your view."

                           

                          It seems that the recommended workaround is to create  technical debt  in the database. If I was going that route - I probably wouldn't - I would save these objects (views, stored procedures, etc) in a dedicated database schema named tableau to

                           

                          • isolate such workarounds
                          • avoid cluttering the database
                          • avoid that the view is used for other purposes
                          • make it easier to clean up technical debt at a later stage
                          • identify cause and causer of technical debt

                           

                          I don't know if it would be better to save the full query as one view or only to make a view of the table where you need the newest and use Tableau's join dialog.

                          1 of 1 people found this helpful
                          • 10. Re: How do you select newest occurence in a history tabel
                            Kent Trustrup

                            I would most likely make an extract, though I'm a bit apprehensive of the size, but live connections has a tendency to cause to many problems within our system.

                             

                            I'm not too familar with the custom sql in Tableau, can I simply write my select from statement there, and get exactly what I need?

                            • 11. Re: How do you select newest occurence in a history tabel
                              kettan

                              can I simply write my select from statement there, and get exactly what I need?

                              Yes! That is the beauty of custom SQL! We can just copy and paste tested and proven queries!

                              • 12. Re: How do you select newest occurence in a history tabel
                                Jonathan Drummey

                                If you're using a live connection then an LOD expression would be more efficient than Custom SQL. The reason why is in two parts:

                                 

                                a) LOD expressions are generally computed as subqueries, however Tableau constructs the query on the fly so if a subquery is not necessary for an LOD expression's results then Tableau will avoid it. A subquery written in Custom SQL is always executed.

                                b) Live connections only query for the fields/columns necessary to build that view (including LOD expressions). Custom SQL always returns every field from the Custom SQL Query (including any subqueries).

                                 

                                If you are using a Tableau data extract then the Custom SQL is only executed when the extract is built or refreshed so those performance issues go away, they just make the extract take longer to compute. And an advantage of extracts is that the fields are fully materialized, indexed, and compressed so at runtime there's no overhead of joins or subqueries.

                                 

                                Other alternatives include:

                                 

                                - Building a database view, some databases support materialized views so the aggregation needed in this case is effectively available as a much smaller table. I recommend using database views instead of Custom SQL whenever possible because database optimizers often do a better job with views (materialized or not) than they do with raw SQL queries.

                                - setting up an ETL process to create the aggregated data set

                                - building a stored procedure

                                - using Tableau data blending (likely not applicable in this case due to volume of data, I've seen it used in other situations)

                                 

                                Jonathan

                                • 13. Re: How do you select newest occurence in a history tabel
                                  kettan

                                  Thanks for sharing this, Jonathan.

                                   

                                  My advice with custom SQL was for creating extract, where I wouldn't expect any serious performance issue beyond the issues it already has (if run as sql with a query tool).

                                   

                                  As for LODs in Kent's case:

                                   

                                  1. I would expect poor performance because LODs are (I assume) applied after JOIN and WHERE and thus would filter an unnecessary blown up table result (unless IBM DB2 has an extremely smart query optimizer that knows how to re-write the query so the table is filtered before the JOIN of this particular history table)
                                  2. in addition, he might get wrong results for the same reason (LODs being applied after WHERE) (more about this in FIXED LoD alternatives which I am not finished to write and thus haven't published yet - it's current status is draft and thus not visible)

                                   

                                  As for live connections:

                                   

                                  I wonder if a custom SQL could be used for the table from which the newest row per ID should be returned. What I am thinking about is using a custom SQL as a table in Tableau's join dialog.

                                   

                                  What is your experience (if any) with custom SQL tables in Tableau's join dialog?

                                  • 14. Re: How do you select newest occurence in a history tabel
                                    kettan

                                    The table I would start with is the 300.000 one, and it would be right joined (yes right) to 2 log tables I need to union that are 1,8 mil and 25 mil, both growing, and the joined to a couple of other tables, and then to a table with roughly 330 million rows. The input from this table would be limited, since it's filtered from the other joins, but it's still a rather large table.

                                     

                                    From AA.TABLE1 a

                                    Where a.time = (select max(time) from aa.table1 where a.uniqueid = aa.table1.uniqueid.time)

                                     

                                    THE TABLE WITH NEWEST

                                    Which of these would be the one that should be limited to newest time per ID?

                                    How many rows does it have? (300K, 1,8M, 25M)

                                    How many rows does it have when reduced to newest per ID?

                                     

                                    THE QUERY OF 300K, 1,8M, 25M

                                    How many rows are returned in your query of the 300K, 1,8M and 25M tables?

                                    How much is the number reduced if filtered to newest per ID?

                                    1 2 Previous Next