4 Replies Latest reply on Apr 20, 2012 9:20 AM by Dan Cory

    Blend vs. Join?

    Andrew Peralta

      Looking to understand whether I should be trying to blend or join my data in the following scenario:


      I have one source that has transactional information for employees

      - Hire date

      - turnover date

      - job

      - location

      - etc.


      I have another file that is at a higher level of detail,  still references the same jobs and locations, but not employees.

      - Job

      - location

      - market salary info

      - date


      What I want to create is something that combines the two, and enables me to see something like all turnovers, by market salary for a specific location. When I try blending them, I find the options for analysis very limiting, whereas when I try to join them, the results I get make no sense.


      Any thoughts to get me started?

        • 1. Re: Blend vs. Join?
          Bruce Segal

          Andrew: I have similar questions and have not yet fully used data blending successfully. Until someone else comes along and can give you more helpful info, the one thing that's give me partial success for data blending is to make sure you blend the least granular data into the more granular data. Sounds like in your case blend the top level job description info into the more granular personal info.


          Also it will help if you have a unique id for the job description that will match in both data sets. And I suspect you want the data in the job description data to blend with the date hired field in the more detailed data set. For data blending you might have to make that match manually b/c you have 2 date fields in you more granular data set.


          Ooh that leads to a related data structure question I propose to others. For your more granular data set ... Rather than using two date columns, should you have one column named date and add a column called something like status (choices = Hire Date or Turnover Date) and two rows for each person. One row showing the job location, description, etc. for each person on the hire date and another row showing the same info on the turn over date. Sometimes the job will be the same, some times not.

          • 2. Re: Blend vs. Join?
            Dan Cory

            Generally if all of your data is from the same source, you should join, not blend. It will (usually) be faster and (usually) let you be more flexible in your analysis.


            If your join is giving results that make no sense, then you probably didn't get the join correct. You would need to join on both job and location. You would need to have the same job and location names or codes in each data set.


            If you could describe here how it "makes no sense" we might be better able to help you!



            • 3. Re: Blend vs. Join?
              Andrew Peralta

              What seems to be happening is a many to many to join. File 1 has information at an employee level, and file 2 has information at job and location level. Each entry in file 2, can apply to many entries in the first file., creating a situation where numbers are inflated. I've attached a sample file to give you an idea.

              • 4. Re: Blend vs. Join?
                Dan Cory

                If it's a many-to-many join, then neither blending nor joins are going to be very helpful. For the question you ask above "all turnovers, by market salary for a specific location", you only need the Monthly Summary table, so just don't join them.


                You probably want two data sources, one for each table, then build dashboards that use actions to filter one table using the other table.