2 Replies Latest reply on Mar 20, 2018 11:12 PM by Jason Chang

    Combining two different data sets into one graph

    Jason Chang

      Hi All,

       

      I am new to the community so please forgive me if I am doing something wrong here. I tried to research around as much as possible but I am still a bit loss.

       

      Problem:

       

      I have two data sets here. One is an aggregated data set on date level. Second one is not aggregated.

       

      Examples:

       

      data set #1

      date         projected_users

      9/1/2017  8,328

      9/2/2017  7,128

       

      data set #2

      date             user_id            gender            age

      9/1/2017      851751            M                    18

      9/1/2017      572385            F                     19

      9/1/2017      895235            F                     20

      9/4/2017      723572            F                     21

       

       

      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

      End Goal: To have a bar chart that gives total users on the Y axis and month/year on the X axis based on the following logic:

       

      I have set up a parameter for months.

       

      IF user chooses month X and parameters/filters like gender/age/etc, THEN count all distinct users from data set #2 where the date of data set #2 is from 2017-08-01 through today's date. THEN add this total to the SUM of the projected_users from today's date through month X.

       

      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

       

      Example

       

      If the user select May 2018. The chart should display:

       

      countd(user_id) where date > 9/1/2017 and < today()

      +

      sum(projected_users) where date > today() and < 5/1/2018

       

      and also be able to filter on the various filters/parameters

       

      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

       

       

      Also note that the way that the filters are applied to the 2 data sets is DIFFERENT.

       

      Example for data set #1, we apply a forecast (a constant %) for an approximation of gender/age/etc. E.g. ~40% of users are male in September so we multiply projected_users*0.4

      Example for data set #2, we have the data there and so we can do some countd(user_id) WHERE gender = M or age = 18 or etc.

       

       

       

      I know it's easier if I share my workbook but I am reluctant to because the data is a bit private.

       

      Is anyone able to help me here? I have tried blending, joining, and keep getting errors. I'm not great at tableau so I'm sure that I am making this more difficult than it needs to be.

        • 1. Re: Combining two different data sets into one graph
          anil saini

          I assume based on what data set you provided, the date is the only field which driving your requirement

           

          I would suggest using join, dataset #2 left join data set #1 on a date field.

           

          please try this approach and it's better to post the error you receive while working with this approach. It will help us to investigate it better

          2 of 2 people found this helpful
          • 2. Re: Combining two different data sets into one graph
            Jason Chang

            Thank you for the response anil.

             

            This is SO strange because I tried this exact method multiple times but I was getting several errors. I can't remember them specifically because I've been working on this for some time, but one was definitely the error "Cannot blend the secondary data source because one or more fields use an unsupported aggregation." I searched through forums here and people had solutions but none of them worked for me.

             

            However, I am no longer getting that error.... what could the problem have been? My suspicion is that it's possible that I had a different data source as my primary data source (because I was doing so many different methods at the time)?

             

             

             

            EDIT:

             

            My problem now: is that I don't know how to have filter on my worksheet that can control both data sets. As mentioned, my 2 data sets are filtered in different ways

             

            For example:

             

            Filter - Gender

             

            Data set #1 will simply multiply the projected users by a factor: ~40% are male and ~60% users are female. And so my calculated field will multiply my projected_users by another calculated field where male is 0.4 and female is 0.6

             

            Data set #2 will be more of a "traditional" filter where it will filter based on the column gender.

             

             

            How can I combine these?