1 2 Previous Next 25 Replies Latest reply on May 19, 2017 2:24 PM by ed.d.1

# HELP! Can't exclude highest and lowest values to finish my final filter

Hi Guys,

I'm getting crazy on how to finalize my tableau filter. I would really appreciate your help.

Almost half on it  and tried several functions however was not working for me.

I'm done for the first half of my filter which is ----->  Accuracy (Latest Date) > 2 AND Shots > [Power]

Can't get work the latter part --->  Accuracy (Latest Date) > ( Avg.Accuracy + 2 x Std.Dev of Accuracy)

Final filter:

Accuracy (Latest Date) > 2 AND Shots > [Power] AND Accuracy (Latest Date) > ( Avg.Accuracy + 2 x Std.Dev of Accuracy)

where:

Power = 25

Avg.Accuracy = 10 days average from last 14 days after excluding two highest and two lowest accuracy values

Std.Dev of Accuracy = 10 days standard deviation from last 14 days after excluding two highest and two lowest accuracy values

@Shinichiro Murakam  - I'm looking forward that somebody could reply here. thanks for replying to me with the other thread!

• ###### 1. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Does this help?

IF [The Date] = TODAY() AND [Accuracy] > 2 AND [Shots] > [Power]

AND [Accuracy] >{AVG([Accuracy])}+2*{STDEV([Accuracy])}

THEN "1"

ELSE NULL

END

• ###### 2. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Thanks Daniel! I actually need to remove the 2 highest and 2 lowest values from last 14 days before averaging it to use  as "Avg.Accuracy"

Same for "Std.Dev of Accuracy" - need to apply standard deviation after removing 2 highest and 2 lowest of accuracy values

I believe your code will word just I need to add those condition I mentiioned above.

• ###### 3. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Also, I tried to validate the standard deviation results. filtering one max code "LNO05155B11" for 14 days from my packaged - it doesn't match the actual calculation.

With Tableau:   the results is 0.093749785

IF [The Date] = TODAY() AND [Accuracy] > {STDEV([Accuracy])} THEN "1"

ELSE NULL

END

With normal calculation.

Standard deviation (s): 0.089517442546431

 MaxCode City The Date Shots Accuracy LNO05155B11 Jupiter 4/20/2017 6 0.334075724 LNO05155B11 Jupiter 4/21/2017 4 0.19389239 LNO05155B11 Jupiter 4/22/2017 3 0.193050193 LNO05155B11 Jupiter 4/23/2017 5 0.405844156 LNO05155B11 Jupiter 4/24/2017 4 0.203045685 LNO05155B11 Jupiter 4/25/2017 4 0.199203187 LNO05155B11 Jupiter 4/26/2017 3 0.152361605 LNO05155B11 Jupiter 4/27/2017 4 0.195217179 LNO05155B11 Jupiter 4/28/2017 5 0.251889169 LNO05155B11 Jupiter 4/29/2017 2 0.137931034 LNO05155B11 Jupiter 4/30/2017 2 0.172562554 LNO05155B11 Jupiter 5/1/2017 5 0.224114747 LNO05155B11 Jupiter 5/2/2017 9 0.428163654 LNO05155B11 Jupiter 5/10/2017 5 0.246548323
• ###### 4. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Ah. You can place the first filter to context and that will take care of that.;)

• ###### 5. Re: HELP! Can't exclude highest and lowest values to finish my final filter

would you mind to show me how to do that?

I only need to exclude two highest and two lowest for these last two.

AND [Accuracy] >{AVG([Accuracy])}+2*{STDEV([Accuracy])}

• ###### 6. Re: HELP! Can't exclude highest and lowest values to finish my final filter

I am on my phone and can only point you what I would have done to exclude the first two and last two

1. Accuracy need to be sorted(order not importan)

2. Use Window_Avg as you can tune the first and last valu included:

Window_Avg(Avg([Accurracy]),First()+2,Last()-2)

• ###### 7. Re: HELP! Can't exclude highest and lowest values to finish my final filter

I just thought of a better way. Use conditions

IF First()>-1 AND LAST()>1 THEN ...

This will exclude the first and last twos.

• ###### 8. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Do I still need to sort it in the calculated field?

SORT then...

IF First()>-1 AND LAST()>1 THEN ...

• ###### 9. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Jonathan Drummey - Would you mind to help me out here?  I found that you are one of the expert of experts here in tableau community. I know for sure you are so busy but in case you see my question and have a spare time please have a look at it. I would really appreciate it. I'm a newbie in Tableau and I have some projects to build. Thanks in advance!

• ###### 10. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Hi Ed,

I’d be glad to help. I think the key issue here is that there the problem you are trying to solve is under-specified vis-a-vis the data so the folks who are trying to help out are running into challenges getting the desired results.

Secondarily what you’re running into is a lack of knowledge about the 4 levels of calculation in Tableau. They are:

1) record-level, these are record-level fields and calculations that only use fields from a single record.

2) aggregate calculations like sum, min, max, etc. This also includes the special case of Level of Detail (LOD) expressions that are aggregates that return record level results.

3) table calculations like INDEX(), the WINDOW_ calculations, TOTAL(), etc. that are computed based on the marks in the view.

4) post-aggregate computations like reference lines, grand totals, trend lines, etc. that are not usable in other calculations (though we can typically mimic their results using other calculations).

I’ll go into more detail by what mean with those statements, I’ve got a number of questions:

1) The latest date is hard-coded to TODAY(), however the data right now is static and doesn’t have data for today. Do you really want TODAY() or the latest date in the data? For example I’ve often worked with people who want to show “today’s data” but the data doesn’t

2) The dates are sparse meaning that there is not data for every day, so is the “last 14 days” based on “14 days worth of data” or “14 calendar days”?

3) The Avg. Accuracy and SD Accuracy calculations are defined as the “10 days average” and "10 days SD”. Do you mean the most recent 10 days worth of data in the 14 or the most recent 10 calendar days?

The grain of the data (what makes a unique row) is The Date and Max Code, and there’s a value of Accuracy for each. This creates a many to many relationship between The Date and Accuracy, this will inform the next few questions.

4) Do you want the Avg and SD to be the Avg and SD across all 10 days or the Avg and SD of the daily value of accuracy, in which case what is the aggregation for that daily value - Average or something else? I’d guess that it’s across all 10 days but not everyone does that, having an understanding of the context of the calculation is key to figuring out what level it needs to be at (aggregate, table calc, etc.)

5) Because of the grain of data there are many values of Accuracy for the latest date. The calculation you wrote looks to be at a record level, which means that there will be lots of true/false results. Is your intention to have multiple values of accuracy returned or do you want a single result for the latest date? If so, how is Accuracy aggregated for the latest date?

6) If the calculation is at a record level then it could be possible for a value of accuracy (latest date) to be one of the top 2 or bottom 2 values that is excluded. Should those latest date values also have have the top 2/bottom 2 values be filtered out of the latest accuracy?

7) What is your goal for this calculation, in other words how do you want to use it? I ask because there may be other routes to the results.

Jonathan

1 of 1 people found this helpful
• ###### 11. Re: HELP! Can't exclude highest and lowest values to finish my final filter

First of All, Thanks Jonathan Drummey for responding to me. This means a lot  to me!!!!

Accuracy (Latest Date) > 2 AND Shots > [Power] AND Accuracy (Latest Date) > ( Avg.Accuracy + 2 x Std.Dev of Accuracy)

1) The latest date is hard-coded to TODAY(), however the data right now is static and doesn’t have data for today. Do you really want TODAY() or the latest date in the data? For example I’ve often worked with people who want to show “today’s data” but the data doesn’t

-  Actually I really want to use the latest date available in the data set it was just because I cannot make the right code for that so I use TODAY()-1 for yesterday's date

2) The dates are sparse meaning that there is not data for every day, so is the “last 14 days” based on “14 days worth of data” or “14 calendar days”?

- There will be new running 14days of data. So the newest day will be the day yesterday and it will be available during midnight of the following day.

3) The Avg. Accuracy and SD Accuracy calculations are defined as the “10 days average” and "10 days SD”. Do you mean the most recent 10 days worth of data in the 14 or the most recent 10 calendar days?

- Yes 10 days average and 10 days SD out of that 14 recent days. So that four days to be excluded should be the top 2 highest and top 2 lowest within the last 14 days.

Sorting from descending order will do.then remove the top two high and 2 lowest values. then average it ==> Avg.Accuracy  and SD it ==>  Std.Dev of Accuracy

The reason is to remove some dates with spike of data and dates  with low data for some reasons.

The grain of the data (what makes a unique row) is The Date and Max Code, and there’s a value of Accuracy for each. This creates a many to many relationship between The Date and Accuracy, this will inform the next few questions.

4) Do you want the Avg and SD to be the Avg and SD across all 10 days or the Avg and SD of the daily value of accuracy, in which case what is the aggregation for that daily value - Average or something else? I’d guess that it’s across all 10 days but not everyone does that, having an understanding of the context of the calculation is key to figuring out what level it needs to be at (aggregate, table calc, etc.)

Do you want the Avg and SD to be the Avg and SD across all 10 days - YES

5) Because of the grain of data there are many values of Accuracy for the latest date. The calculation you wrote looks to be at a record level, which means that there will be lots of true/false results. Is your intention to have multiple values of accuracy returned or do you want a single result for the latest date? If so, how is Accuracy aggregated for the latest date?

do you want a single result for the latest date?  - YES I want to have a single results for the latest date

how is Accuracy aggregated for the latest date?- I'm not sure if I understand this question correctly. Still what needs to compare is  that Accuracy (Latest Date) using that

filter condition formula. In the end I will have a total count of  this Accuracy (Latest Date) .

6) If the calculation is at a record level then it could be possible for a value of accuracy (latest date) to be one of the top 2 or bottom 2 values that is excluded. Should those latest date values also have have the top 2/bottom 2 values be filtered out of the latest accuracy? - Yes if Accuracy (Latest Date) is one of the top 2 or bottom 2 then it will be excluded for calculation of Avg.Accuracy and Std.Dev of Accuracy but still it will be Accuracy (Latest Date) > 2 AND Shots > [Power] AND Accuracy (Latest Date) > ( Avg.Accuracy + 2 x Std.Dev of Accuracy)

7) What is your goal for this calculation, in other words how do you want to use it? I ask because there may be other routes to the results.

I will have a final count for each city for each MaxCode that fulfills my condition below"

IF [The Date] = TODAY() AND Accuracy (Latest Date) > 2 AND Shots > [Power] AND Accuracy (Latest Date) > ( Avg.Accuracy + 2 x Std.Dev of Accuracy

ELSE NULL

END

• ###### 12. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Hi Ed,

Thanks for the response, I have a couple more questions, the first two are clarifying questions from above and the third is a new question.

3) The Avg. Accuracy and SD Accuracy calculations are defined as the “10 days average” and "10 days SD”. Do you mean the most recent 10 days worth of data in the 14 or the most recent 10 calendar days?

- Yes 10 days average and 10 days SD out of that 14 recent days. So that four days to be excluded should be the top 2 highest and top 2 lowest within the last 14 days.

Sorting from descending order will do.then remove the top two high and 2 lowest values. then average it ==> Avg.Accuracy  and SD it ==>  Std.Dev of Accuracy

The reason is to remove some dates with spike of data and dates  with low data for some reasons.

Based on your responses I'm thinking that what you are looking for is to remove the top 2 highest and top 2 lowest for each MaxCode, and not the top 2 highest and top 2 lowest across all MaxCodes?

4) Do you want the Avg and SD to be the Avg and SD across all 10 days or the Avg and SD of the daily value of accuracy, in which case what is the aggregation for that daily value - Average or something else? I’d guess that it’s across all 10 days but not everyone does that, having an understanding of the context of the calculation is key to figuring out what level it needs to be at (aggregate, table calc, etc.)

Do you want the Avg and SD to be the Avg and SD across all 10 days - YES

From your screenshot it looks like the Avg and SD need to be recomputed for each MaxCode, is that true?

8) Not every MaxCode has 14 days worth of data in your sample data set, there are 58 MaxCodes with data for 9 or less dates and 14 MaxCodes have 4 or fewer dates. If the rule is to always remove the top & bottom 2 for each then that would bring the total number of values to calculate over below 10 or even down to 0. If it's true in your real data that not every MaxCode will have 14 days worth of data, what should be done when there are fewer days worth of data?

Jonathan

1 of 1 people found this helpful
• ###### 13. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Hi Jonathan, apology for the late response.

Kindly see below:

3) The Avg. Accuracy and SD Accuracy calculations are defined as the “10 days average” and "10 days SD”. Do you mean the most recent 10 days worth of data in the 14 or the most recent 10 calendar days?

- Yes 10 days average and 10 days SD out of that 14 recent days. So that four days to be excluded should be the top 2 highest and top 2 lowest within the last 14 days.

Sorting from descending order will do.then remove the top two high and 2 lowest values. then average it ==> Avg.Accuracy  and SD it ==>  Std.Dev of Accuracy

The reason is to remove some dates with spike of data and dates  with low data for some reasons.

Based on your responses I'm thinking that what you are looking for is to remove the top 2 highest and top 2 lowest for each MaxCode, and not the top 2 highest and top 2 lowest across all MaxCodes?

- I'm not sure if I understand when you say "across", my understanding is that you will keep the whole table and just remove  remove those columns only (with top 2 highest and top 2 lowest). Anyway I need the value of Accuracy for Avg and SD across the whole available dates after removing the top 2 and 2  lowest - meaning ~10 days for my case. Some dates may have no data. In case for example there are only 13 days available then the idea still is to remove those top 2 and 2 lowest.

4) Do you want the Avg and SD to be the Avg and SD across all 10 days or the Avg and SD of the daily value of accuracy, in which case what is the aggregation for that daily value - Average or something else? I’d guess that it’s across all 10 days but not everyone does that, having an understanding of the context of the calculation is key to figuring out what level it needs to be at (aggregate, table calc, etc.)

Do you want the Avg and SD to be the Avg and SD across all 10 days - YES

From your screenshot it looks like the Avg and SD need to be recomputed for each MaxCode, is that true?

- YES correct. After removing it needs to be recalculated again based on the removal of top2 and 2 lowest.

8) Not every MaxCode has 14 days worth of data in your sample data set, there are 58 MaxCodes with data for 9 or less dates and 14 MaxCodes have 4 or fewer dates. If the rule is to always remove the top & bottom 2 for each then that would bring the total number of values to calculate over below 10 or even down to 0. If it's true in your real data that not every MaxCode will have 14 days worth of data, what should be done when there are fewer days worth of data?

- Some dates may have no data. In case for example there are only 13 days available or less and so forth...  then the same idea still  to remove those top 2 and 2 lowest and calculate for average  ==> Avg.Accuracy  and SD  ==>  Std.Dev of Accuracy

- That would be OK if the results would be zero

• ###### 14. Re: HELP! Can't exclude highest and lowest values to finish my final filter

Hi Ed,

Thanks for your answers, one more question and then I think I'll be able to build something that meets your requirements.

9) You'd defined the removal as based on the two highest and two lowest values. What if there are multiple days that have the same value? Here's an (extreme) example from the sample data:

The two highest values are 50 and 0, the two lowest values are both 0. We could remove all data points with a 50 or zero, or just remove four date's worth of data. Which would you prefer?

Jonathan

1 2 Previous Next