1 Reply Latest reply on Aug 9, 2016 4:14 PM by diego.medrano

    Deduplicating within a self-blend

    Santiago Perez

      Hi Everyone,

       

      I know there are many answered posts on the issue of dealing with duplication caused by joining tables together. I'm familiar with all the usual approaches, from blending to custom sql to LOD expressions. However, my particular use case is very specific and has some thorny challenges.

       

      Scenario:

      1) I want to give my users the ability to benchmark themselves against other users. So I take my data connection (to a postgresql database) and duplicate it (I'll call the connections A1 and A2)

      2) I apply a data source user filter to A1 and none to A2. That way, they see their own data in metrics from A1 and an aggregate benchmark of all other data from A2

      3) The reason I need (i think) the two connections rather than just using LOD in a single connection is that I want dimension filters to impact the benchmark and not the users data (so that they can compare themselves to Category 1 users, without having the table disappear if they are not themselves a Category 1 user)

      4) The problem is that my data has a ton of duplication as a result of joining tables. So a value of 10 comes out as 400 in many cases.

      5) Within the primary datasource, I'm able to deduplicate using this LOD expression: sum({fixed Table ID:avg(metric)}), which essentially takes the average calculated by every unique ID, thereby cancelling out duplication, and then sums them. But this does not work in the secondary datasource, because LOD, Countd() and other functions that are helpful here do not work in secondary datasources.

       

      Soooo....any way to deduplicate the measures in the secondary datasource using table calcs or something that works in a secondary datasource? Alternatively, if my key assumption that I need to self blend to achieve the benchmark result I'm looking for is flawed, then I'm all ears! If the answer is that I need to resort to custom SQL in the secondary datasource...that would be unfortunate.

       

      Its tough for me to post a twbx since the data is proprietary. If one is necessary to answer the question, let me know and I'll cook up some fake data. Thanks!