8 Replies Latest reply on Feb 28, 2013 7:11 AM by Mårten Lindblad

    Filtering data across multiple connections

    Gregory W

      Ok, to set the scene, I have two connections to data: sales and number of customers. It's not be feasible to combine this data together before it gets into Tableau.

       

      I have named all of my columns the same for the dimensions of both connections (and automatic management of the relationships looks good).

       

      The issue that I'm having difficulty with is calculating the sales per customer. Now bear with me because it's not quite that simple... this data is updated daily, but occasionally, one of the sources will fail to update. If one of the sources does fail, one set of data will be complete for yesterday, and the other will not. This means if I calculate the number of sales per customer, that value will either be inflated or decreased artificially when looking over a period such as current month to date.

       

      Initially (when I was only dealing with one connection) I added a MaxDate column (which was a constant) to the SQL query. This MaxDate effectively gave the date at which the query was run, or the date of the most recent sale, whichever was the greater.

       

      I added the same MaxDate column to my second connection. What I would like to do is something either like this:

       

      1. Get the minimum of the MaxDates for the two connections, lets call this MinMaxDate

       

      Filter out any results where either connection's Date > MinMaxDate (because one day's worth of data is useless without the other)

       

      OR

       

      2. Filter out results where Sale Date > Customer MaxDate OR Customer Date > Sale MaxDate

       

      This would prevent me from getting situations where I have 5 days worth of sales for a week but only 4 days worth of customers and getting a skewed calculation.

       

      The issue that I'm having is that for cross connections I can only do comparisons on Constants or Aggregates.

       

      What I seem to need to be able to do is take the MaxDate from one source, turn it into a constant, that I can then compare against for each row in my data for the other connection. This is where I'm stuck...

       

      Any help much appreciated

        • 1. Re: Filtering data across multiple connections
          Mårten Lindblad

          Have you tried to create a T/F variable comparing both of the date dimensions and filter the report to only use "TRUE" values?

          1 of 1 people found this helpful
          • 2. Re: Filtering data across multiple connections
            Gregory W

            Hi Mårten

            Yes this is the approach I have tried to take. The issue I have is with treating the MaxDates from the other connection as a constant that I can compare against each row to get the true/falses.

             

            EDIT: Clarification and apologies for butchering your name first time around

            • 3. Re: Filtering data across multiple connections
              Mårten Lindblad

              The "Data Blending" is truly awful in Tableau.

               

              Another way to go is to check if "Edit Relationship" is automatic (under data).

              If so, you could just drag both of the date to the filter and exclude null and * from the report.

               

              • 4. Re: Filtering data across multiple connections
                Gregory W

                Thanks Mårten,

                 

                This works very nicely with one exception (which unfortunately means I still need to figure out a work around).

                 

                If you have a date where you had poor performing day with 0 sales, you won't get any record in the SQL query for this day. This means tableau would ignore days where the conversion rate was 0%, therefore giving inaccurate results for conversion rates over time periods...

                 

                I'll just add, for anyone who's coming in from Google, and decides the above situation isn't a problem for them, then you will need to add the field to the filter from the secondary source.

                • 5. Re: Filtering data across multiple connections
                  Gregory W

                  I guess one way to this about it is that I need a hard start and end cap on the date ranges, and I need to calculate these caps from data that is in the connection. Currently, this data is in the form of a constant date, that is the same for all rows in a connection, but not necessarily the same in all connections. Visually this might look like this:

                   

                  Conn1 Dates            |--------------------------------------------------|

                  Conn2 Dates    |------------------------------------------|

                  Include Dates           |------------------------------------|

                  • 6. Re: Filtering data across multiple connections
                    Mårten Lindblad

                    I always recommend setting up date dimension with left join the sales table and sum using the dimension date values.

                     

                    If I have no access to create a table, just use this code:

                     

                    SELECT DIMDATE, SUM(SALES)

                    FROM (

                      SELECT DATEDIFF('day', TODAY(), (NROW*-1)) AS DIMDATE

                      FROM  (

                        SELECT ROW_NUMBER() AS NROW FROM BIG_TABLE LIMIT 3000) A

                      ) DIMDATE

                    LEFT JOIN SALES ON DIMDATE.DIMDATE = SALES.SALEDATE

                    GROUP BY 1

                    1 of 1 people found this helpful
                    • 7. Re: Filtering data across multiple connections
                      Gregory W

                      Thanks again Martin. This issue I face is that my example is greatly simplified. I have many dimensions, and each one of them would require joining with a date to ensure that there is redundancy in the case that any combination of them has no relevant record.

                       

                      For example if someone filtered on the location, I would need to ensure I had a row for every location when there were not sales also. This would increase the size of my record-set hundreds of times.

                       

                      However, it's a great technique on a very simple query with only one dimension like date, and I've used it in the past to work around shortcomings in Excel's pivot tables.

                       

                      If only I could some how populate a constant (like a parameter) from the data when it updates...

                      • 8. Re: Filtering data across multiple connections
                        Mårten Lindblad

                        Yes I agree. Sorry I couldn't help any further!

                        Hopefully they will allow us to populate a parameter from a data source some day...