11 Replies Latest reply on Apr 5, 2013 12:58 PM by Nathan Lee

# Modified Z-Score? (Requires Median Average Deviation)

Greetings. I am trying to implement a Modified Z-Score (http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm#Z-Scores) in my data analysis. The data I am looking at may be significantly affected by extreme outliers, and the Modified Z-Score is better suited than the "standard" Z-Score. I have already managed to implement the Z-Score calculation.

The issue I am running into is the Modified Z-Score's use of the Median Absolute Deviation (MAD). To calculate it, I need to calculate the median of the difference of each data point from the median of the data:

But I can't seem to calculate the median of differences, because Tableau tells me I can't calculate an aggregate function on an aggregate.

Has anyone here implemented these functions?

• ###### 1. Re: Modified Z-Score? (Requires Median Average Deviation)

Nathan,

You can use WINDOW_MEDIAN().  I blogged about creating Z-Scores in Tableau.  You can alter my code to use WINDOW_MEDIAN instead of WINDOW_STD.

http://breaking-bi.blogspot.com/2013/03/conducting-2-sample-z-test-in-tableau.html

Enjoy,

Associate Consultant

Mariner, LLC

http://breaking-bi.blogspot.com

• ###### 2. Re: Modified Z-Score? (Requires Median Average Deviation)

I don't think that's it, but maybe I'm missing something. The denominator of the Z-Score is the sample standard deviation. The denominator of the Modified Z-Score is the Median Absolute Deviation, not just a median of the samples, which is what you appear to be suggesting.

• ###### 3. Re: Modified Z-Score? (Requires Median Average Deviation)

Nathan,

I think you may be right that this is impossible.  If you were to attempt to put [X] on a chart, you would have to aggregate it somehow.  Therefore, if you wanted to compare [X] to its median, you would need use a WINDOW_MEDIAN().  However, when you get to the denominator, you would need to take another median.  Tableau limits you to only use two aggregations.  See the below "attempt" at coding:

( MAX( [X] ) - WINDOW_MEDIAN( MAX( [X] ) ) /

WINDOW_MEDIAN( ABS( MAX( [X] ) - WINDOW_MEDIAN( MAX( [X] ) ) ) )

To conclude, I do not believe this is possible in Tableau due to the number of iterative functions that are needed.  You may have to perform this operation before you pull the data into Tableau.

Thanks,

Associate Consultant

Mariner, LLC

http://breaking-bi.blogspot.com

1 of 1 people found this helpful
• ###### 4. Re: Modified Z-Score? (Requires Median Average Deviation)

That is disappointing, but I believe you are correct.

• ###### 5. Re: Modified Z-Score? (Requires Median Average Deviation)

On the bright side, there is an idea on the Tableau forums to add R functionality.

Thanks,

Associate Consultant

Mariner, LLC

http://breaking-bi.blogspot.com

• ###### 6. Re: Modified Z-Score? (Requires Median Average Deviation)

Zm =

0.6745 * (ATTR([value]) - WINDOW_MEDIAN(ATTR([value]))) /

WINDOW_MEDIAN(ABS(ATTR([value]) - WINDOW_MEDIAN(ATTR([value]))))

For the table calc, you need to compute using a value that is at the row level. I.e., one row_id / value. As a check, ATTR() will return * if there is more than one value.

1 of 1 people found this helpful
• ###### 7. Re: Modified Z-Score? (Requires Median Average Deviation)

Jim,

Your denominator has a Window median of a window median, is that valid?

Thanks,

Associate Consultant

Mariner, LLC

http://breaking-bi.blogspot.com

• ###### 8. Re: Modified Z-Score? (Requires Median Average Deviation)

I think it's valid.

I did another version where Zm was a nested table calc -- I calculated an separate [value median] and took the median of this value in [Zm]. It looked like a "normal" nested table calc, but I'll admit that I don't use these often.

In Zm v2, i just pasted the formula for [value median] into [Zm v2]. This removes the nested table calc, but gives the same answer. ...

Jim

• ###### 9. Re: Modified Z-Score? (Requires Median Average Deviation)

Jim, it looks like you did create a working formula.

It is my understanding that the ATTR() function returns the value of the argument if there is only one value. In which case, a WINDOW_MEDIAN of ATTR() would always equal ATTR(), rendering the median ineffective. I don't know that a Modified Z-Score calculated on a fully homogeneous series is useful.

(ATTR([data series]) = x only if [data series] = {x, x, x,...x}. Otherwise, if for instance [data series] = {x, y, x, x}, ATTR([data series]) = "*".) - tell me if I'm wrong on that.

Brad, it's good to know R may be in Tableau's future. I'm constantly running into limitations in the calculation functionality. In the mean time, it may be a good idea to learn R, and use Tableau as a visualization tool.

• ###### 10. Re: Modified Z-Score? (Requires Median Average Deviation)

Hi Nathan,

I think the modified Z-score is working as expected across a non-homogenous data set?

value = [1, 1, 2, 2, 4, 6, 9]

median = 2

absolute deviation (AD) = |value - median| = [1, 1, 0, 0, 2, 4, 7] = [0, 0, 1, 1, 2, 4, 7]

From the twbx in the above message:

As you said, ATTR() returns only one value, but it operates at the level of detail specified in the table calculation---in this case my "row" variable, which is a unique ID per row. Therefore ATTR() is returning one value / row and WINDOW_MEDIAN(ATTR(value)) = 2.

Since there is only one value / row, you could also use min(), max(), avg(), ..., and get the same result as ATTR(), but ATTR() has an "error checking" advantage that if the table calc compute using field is not properly specified and there is more than one value / row, ATTR() will return null / * and the table calc will be null.

Another approach to this problem is to use custom SQL and do the stats / calculations with SQL. You can probably find SQL examples of common stat functions. Custom SQL has the advantage that if others are using your workbook, you avoid the complexity of table calcs. Note that this works best on real databases. If you're connecting to an Excel or text file, the Microsoft Jet database is limited and, for example, doesn't have a MEDIAN function.

Yet another approach, especially if you're using Excel or CSV files, is to just run the files through an R script to add or modify the necessary columns. Or, with Tableau 8, you could use the Python Tableau API to load a Tableau Data Extract into Python, add whatever stats, and then output an updated TDE.

Doing basic stats in R is much easier than table calcs. Here's a four-line R script "mad_process.r":

# Input   = csvFile with "value" column

# Output = csvFile with new "value_Zm" column

# Zm = 0.6745*(value - median(value) / MAD(x)

# MAD(x) = median absolute deviation = median(x - median(x))

#

# For variable assignment, R traditionally uses "<-" but you can also use "="

dv    <- data\$value

data\$value_Zm  <- 0.6745*(dv - median(dv)) / median(abs(dv - median(dv)))

# With one extra line, we could also add standard Z scores

data\$value_Z     <- scale(dv)   #equivalent to (dv - mead(dv)) / sd(dv)

Install R from http://cran.r-project.org/bin/windows/base/ (You should also install R-Studio if you're going to use R interactively.)

Add the R install directory\bin to your Windows path, and you can run the above script from the command line with: