4 Replies Latest reply on Jan 29, 2019 6:39 AM by Andy Spink

# Ignore duplicates created by connected join

I have a main data source (google sheets) where all business activity is recorded; some data around sales would look like this:

 Reference Activity Date Sum(Number of records) HLF1000 Sale 01/01/2019 1 ACS1278 Sale 03/01/2019 1 WWH0897 Sale off 05/01/2019 1 HLF6490 Sale 07/01/2019 1 WWH0976 Sale 11/01/2019 1

To calculate the total number of sales I use this formula IF Activity = 'Sale' THEN 1 ELSE 0 END. This would give me a total of 4 as above.

I then added a connection (also a google sheet) containing external market data to see how many of our sales were also listed by competitors. All field names are the same in this connection and have a left join using the 'Reference' dimension.

The union now looks like the grid below:

 Reference Activity Date Sum(Number of records) Reference (Union) Activity (union) Date (union) Listed by Sum(Number of records) HLF1000 Sale 01/01/2019 1 HLF1000 Market sale 10/11/2018 Our company 1 HLF1000 Sale 01/01/2019 1 HLF1000 Market sale 03/01/2019 Competitor 1 ACS1278 Sale 03/01/2019 1 ACS1278 Market sale 02/01/2019 Our company 1 WWH0897 Sale off 05/01/2019 1 NULL NULL NULL NULL HLF6490 Sale 07/01/2019 1 HLF6490 Market sale 14/01/2019 Our company 1 WWH0976 Sale 11/01/2019 1 WWH0976 Market sale 16/01/2019 Our company 1

I now have a database showing that one of the references *HLF1000* was listed twice in the connected database with both ourselves and a competitor. This has created a second row for the data, which means my calculation for total sales is now showing as 5 instead of 4.

I have tried doing a fixed calculation to only calculate one occurrence of each Reference in the database - { FIXED [Reference] : SUM ([Total sales]) } however this isn't changing the outcome.

Based on the information above, how can i calculate a total number of records for data in the left join while ignoring it's frequency in the connected database?

Thanks

• ###### 1. Re: Ignore duplicates created by connected join

Hi Andy = would prefer to see your twbx workbook with the data

but your LOD sums the duplicate data - you only want 1 not the sum so you could use max() or Min() or avg()

Jim

If this posts assists in resolving the question, please mark it helpful or as the 'correct answer' if it resolves the question. This will help other users find the same answer/resolution.  Thank you.

1 of 1 people found this helpful
• ###### 2. Re: Ignore duplicates created by connected join

Thanks Jim. Using the max() function stopped the duplication of totals as required.

I also have another query on this matter if you are able to assist?

Using the 'reference' dimension as the unique identifier - if I had two appointment dates in the table with the same reference, how can I filter the table to only show the row with the most recent appointment date?

• ###### 3. Re: Ignore duplicates created by connected join

here is a 2 step process - first an LOD to find the max data by ref

LODmaxdate>>     {fixed [reference:max[date]}

then you can find the data by        if [date] = LODmaxdate then .......

Jim

1 of 1 people found this helpful
• ###### 4. Re: Ignore duplicates created by connected join

Great, thank you!