4 Replies Latest reply on Jan 29, 2019 6:39 AM by Andy Spink

    Ignore duplicates created by connected join

    Andy Spink

      I have a main data source (google sheets) where all business activity is recorded; some data around sales would look like this:


      ReferenceActivityDateSum(Number of records)
      WWH0897Sale off05/01/20191


      To calculate the total number of sales I use this formula IF Activity = 'Sale' THEN 1 ELSE 0 END. This would give me a total of 4 as above.


      I then added a connection (also a google sheet) containing external market data to see how many of our sales were also listed by competitors. All field names are the same in this connection and have a left join using the 'Reference' dimension.


      The union now looks like the grid below:


      ReferenceActivityDateSum(Number of records)Reference (Union)Activity (union)Date (union)Listed bySum(Number of records)
      HLF1000Sale01/01/20191HLF1000Market sale10/11/2018Our company1
      HLF1000Sale01/01/20191HLF1000Market sale03/01/2019Competitor1
      ACS1278Sale03/01/20191ACS1278Market sale02/01/2019Our company1
      WWH0897Sale off05/01/20191NULLNULLNULLNULL
      HLF6490Sale07/01/20191HLF6490Market sale14/01/2019Our company1
      WWH0976Sale11/01/20191WWH0976Market sale16/01/2019Our company1



      I now have a database showing that one of the references *HLF1000* was listed twice in the connected database with both ourselves and a competitor. This has created a second row for the data, which means my calculation for total sales is now showing as 5 instead of 4.


      I have tried doing a fixed calculation to only calculate one occurrence of each Reference in the database - { FIXED [Reference] : SUM ([Total sales]) } however this isn't changing the outcome.


      Based on the information above, how can i calculate a total number of records for data in the left join while ignoring it's frequency in the connected database?