2 Replies Latest reply on Aug 10, 2018 12:30 PM by Jim Dehner

# Calculation error due to messy data

Hi All,

I have a lengthy two part question so bear with me please.

Background:

I'm pulling in data from SQL and due to the nature of the data my table gets blown out resulting in duplicate rows. The data are test results for individuals. Each individual gets a distinct ID number and tied to that ID number are 41 [Targets] which each have a numerical [Result]. What I haven't included in the example workbook (privacy concern) is another column that gives descriptive statistics based on that ID. That column causes the table to give me duplicate rows for each test result, not the end of the world since the numerical result is the same for each [Target] so I can use ATTR or aggregate using Avg to get the correct numbers. It is however causing a problem when I am performing a series of calculations on the data.

Calculations:

The end user wanted the data to be transformed and scaled using these steps:

1. Take the results that fall within the 20th and 80th percentiles for each gender

2. Take the average and standard deviation for the results that fall into that range for each gender

3. Perform a z-score transformation (x - mean) / standard deviation (using mean and standard deviation of results in the range)

4. Scale the data as such : (z-score/4)+2

You'll see in my workbook a folder under Measures called "A10 Calculations", there you will see how I separated out the results by gender (A10 (female) and A10 (male)) and then performed the calculations stated above. So part one of my question is if this series of calculations can be done in a more condensed way. As you'll see it takes 14 calculated fields to come to the answer [Normalized A10] and I would hopefully like to trim that down.

The second part of my question is that when I take the average of the results that fall into the percentile range I am getting an incorrect answer. In Sheet 1 you'll see the values for target A10 that fall into the range. In the summary to the right you can see the correct average of those values to be 826.9. In Sheet 2 I pulled in the [AVG (female)] calculated field and it shows 799.5. The reason the value is incorrect is because of the duplicate value issue stated above. my calculated field [AVG (female)] is taking the average of the summed duplicate rows when it should be taking the values showed in Sheet 1. Is there any way to resolve this?

If you're still reading this thank you, I know it is an extremely lengthy post. Any insight would be wonderful.

Thanks