11 Replies Latest reply on Feb 13, 2020 3:40 PM by Jonathan Drummey

    Tableau Prep - Join based on date & ID

    David Loesch

      I have data on patients, specifically create_date (date patient record was created) and create_user (user that created the patient). I also have a table of users that has their current department (here in lies the problem). I have the ability to write to/modify the users table (as well as create new tables if needed) but to not have the ability to modify the patients table. I want to know the department of the user that created the patient at the time of creation, not current department. To help illustrate, here are some example tables:

      So I would like to be able to create/calculate/join a column in/to the patients table that would have the department of the user at the time of the patient's create date (highlighting added to make it easier to pick user 2's info):

      The column I'm after is: Create_User_ID_Dept (outlined in red). I'd like to be able to a join using clauses something like this:


      patients.create_user_id = user_dept_changes.user_id

      patients.create_date <= user_dept_changes.date

      patients.create_date > user_dept_changes.[second most recent change date] - ????? Here's where I'm lost/stuck. If I'm way off on my approach, I'm fine entirely retooling, this is just the way my brain is trying to solve this.


      I've attached a simple packaged workbook with the same tables as pictured above. Thanks.

        • 1. Re: Tableau Prep - Join based on date & ID
          David Loesch

          Well I accidentally marked this as "Assumed Answered" and I can't figure out how to mark it "Not Answered". Seems like I can't? If not, can a mod mark it not answered, please?

          • 3. Re: Tableau Prep - Join based on date & ID
            Justin Thompson

            I can think of a way to do it if Prep let you sort columns, but alas it does not.  This is one of my big issues yet unresolved.  You can sort the summary (top portion), but not the output.

            • 4. Re: Tableau Prep - Join based on date & ID
              David Loesch

              Yeah, exactly what I'm running into. If I put my table into Excel/G Sheets first, I can do it with a formula, but that's an obvious PITA that isn't scalable. I'm thinking this may need to be a Python script step, but alas, I don't know Python.

              • 5. Re: Tableau Prep - Join based on date & ID
                Jonathan Drummey



                Here are two solutions, both built using Tableau Prep v2020.1.1. This first one joins the data together then uses an aggregate to identify the latest date from the department changes for a given patient and then joins that back in:


                Screen Shot 2020-02-12 at 3.53.48 PM.png


                This second solution uses the "in the release but not documented and not yet quite openable" rank/row numbering feature being introduced in Tableau Prep v2020.1.3 which should be out in the very near future. Prior to that release you can open this with the v2020.1 beta that you can download from https://prerelease.tableau.com. We can avoid the aggregate+join with a ROW_NUMBER() function:


                Screen Shot 2020-02-12 at 3.55.44 PM.png


                Both packaged flows are attached.



                2 of 2 people found this helpful
                • 6. Re: Tableau Prep - Join based on date & ID
                  David Loesch

                  Wow, this is great! I don't really understand your function, but it seems like it's doing exactly what I wanted! Do you have any resources anywhere to point me to in order to learn about the PARTITION... and ALONG... formulas?

                  • 7. Re: Tableau Prep - Join based on date & ID
                    Jonathan Drummey

                    Here's a link to the analytic function information on the prerelease site: https://prerelease.tableau.com/project/article/default.html?cap=5c890c1ddfc84f1fbb7c616f2dbd8222&arttypeid=6b659c52e1484….


                    However that link will likely go dead when the 2020.1.3 release comes out sometime in the next few days/weeks, and then we'll be able to look at the regular Prep documentation. Also Prep 2020.1.3 is expected to have a visual interface for setting up these calcs (there's partial support in the beta).


                    In any case the PARTITION is a "for each" and the ALONG defines the sort order and the particular function. Therefore {PARTITION [Patient_ID], [Create_Date], [Create_User_ID] ...} says "for each distinct combination of Patient Id, Create date, and Create User ID...do something....". Then the {ALONG [Date] DESC : ROW_NUMBER()} says "sort the rows in each partition based on the Date in descending order and then given each row a unique row number from 1 to N". This syntax is a variation on the SQL syntax for window functions (which is in fact what Prep turns these analytic functions into in order to run them on it's internal Hyper database).



                    1 of 1 people found this helpful
                    • 8. Re: Tableau Prep - Join based on date & ID
                      David Loesch

                      Wow, that was a very very succinct, perfectly descriptive, helpful answer. Thank you so much, Jonathan Drummey!

                      • 9. Re: Tableau Prep - Join based on date & ID
                        David Loesch

                        Just wanted to make sure you saw both Jonathan's answers below. They were the exact answer I needed, they may help you, too.

                        • 10. Re: Tableau Prep - Join based on date & ID
                          David Loesch

                          This explanation helped me tweak it slightly to make sure it works into the future. For others reading this in the future, here are the few tweaks I made:


                          If I understand your flow and formula right, it was filtering to only the most recent department change. However, I wanted to be able to join the patient created date to the dept_changes table to be able to get the users' departments at the time of the patient create date, not get their current/most recent department. Here's how I modified.


                          Before joining to patients, I used the formula you suggested but modified to partition just on user_id and put ascending by date, instead of descending:


                          This resulted in the row number (what I call "User Occurrence No") by user:

                          1 = first occurrence of that User_ID

                          2 = the second occurrence of that User_ID

                          ...and so on


                          I created a helper column that just gave me that occurrence number + 1 so I would know the "next" occurrence number for each change entry (and be able to join on it later):


                          Next, I split that clean, renamed a few things then rejoined it to itself so I could have both of those dates (each row's change date and the next change date):


                          This resulted in this data set (after some removing extra columns and renaming):

                          Because I want to do a join where the patients.create_date > dept_changes.change_date AND patients.create_date > dept_changes.next_change_date, I needed to group and replace nulls with a way later date that no entry would be greater than (there may be a more efficient way to do this or the proceeding join...?):


                          Lastly, I joined to my Patients table with a three-clause join:


                          Again after some eliminating and renaming columns, here's the final, resulting data set (just for a single user, 2, to make it easier to look at):


                          I've attached a packaged flow for any that want to see it.

                          • 11. Re: Tableau Prep - Join based on date & ID
                            Jonathan Drummey

                            I'm sorry, I got the join mixed up and didn't check my work well enough. In any case doing all the work in the smaller users & changes tables is likely to be faster & use less memory than what I'd built, nice work!


                            Since we can't really use OR clauses in a join in Prep right now to do a ([Create_Date] < [Next Change Date] or ISNULL([Next Change Date])) the way I'd avoid the 9999-12-31 imputed next change date would be to leave it null, then leave the clause on Create_Date < Next Change Date off the join, and then after the join adding a filter calculation.