My first TCRL contribution...
Alternate names: Median, quartile, decile, percentile, etc (which are all specific examples of quantiles)
Quantiles divide ordered data into a series of essentially equal-sized data subsets; the quantiles are the data values marking the boundaries between consecutive subsets. (This is paraphrased from the definition given at: http://en.wikipedia.org/wiki/Quantile.)
IF ([k] == 0) THEN LOOKUP(A, 1 - INDEX()) ELSEIF ([k] == [q]) THEN LOOKUP(A, SIZE() - INDEX()) ELSEIF ((([k]/[q]) * SIZE()) == INT(([k]/[q]) * SIZE())) THEN (LOOKUP(A, (([k]/[q]) * SIZE()) - INDEX()) + LOOKUP(A, (([k]/[q]) * SIZE()) + 1 - INDEX())) / 2 ELSE LOOKUP(AVG([Sales]), (ROUND((([k]/[q]) * SIZE()) + 0.5, 0) - INDEX())) END
Compute Using: Ordered by A ascending.
Note that the ordering must be per partition, which may require inclusion of partitioning fields in the list of addressing fields and the use of "Restarting every", as described in this thread: http://www.tableausoftware.com/forum/how-specify-ordering-within-each-partition-independently.
Inputs and Setup:
A: An aggregate measure such as AVG([Measure]) or a calculation producing one. Typically the quantile will be calculated across the underlying detail data, so the combination of fields on the shelves must uniquely identify each row within the partition and the most appropriate aggregate function would be ATTR() (but the ordering can not be specified using ATTR()).
q: The number of subsets to divide the data into (e.g. 2 for median or 100 for percentile).
k: The required quantile (e.g. 95 for the 95th percentile).
[q] and [k] may be provided by parameters, field values, calculated fields or may be hard-coded.
IF ([k] == 0) THEN LOOKUP(ATTR([Sales]), 1 - INDEX()) ELSEIF ([k] == [q]) THEN LOOKUP(ATTR([Sales]), SIZE() - INDEX()) ELSEIF ((([k]/[q]) * SIZE()) == INT(([k]/[q]) * SIZE())) THEN (LOOKUP(ATTR([Sales]), (([k]/[q]) * SIZE()) - INDEX()) + LOOKUP(ATTR([Sales]), (([k]/[q]) * SIZE()) + 1 - INDEX())) / 2 ELSE LOOKUP(ATTR([Sales]), (ROUND((([k]/[q]) * SIZE()) + 0.5, 0) - INDEX())) END
There are numerous different definitions of quantiles, producing subtly different answers, with the differences mainly concerning truncation, rounding or interpolation to choose the required value when the quantile boundary does not fall exactly on a value in the data set.
For the sake of consistency, this calculation implements the algorithm used by Tableau for reference lines. (Or at least it is intended to!)
This is the 2nd out of 10 algorithms given at http://en.wikipedia.org/wiki/Quantile (referred to as R-2, SAS-5).
Note that the calculation given above does not attempt to validate the parameters given, so inconsistent values for [k] and [q] may produce spurious results. In the example workbook attached I have included some basic validation (see calculated fields [Quantile Valid] which is used to suppress calculation of invalid results and [Quantile Message] which is displayed on the view Title in the case of error). I'm not sure if this the best way to handle validation of parameters - I feel another thread coming on.
The attached example calculates quantiles (initially percentiles, but this is parameterised) for the sample Coffee Sales data. This demonstrates the display of percentiles against a continuous dimension, which is not possible with percentile reference lines. A combo chart showing percentiles displayed over the detailed data is also included.
QuantileSample.twbx 216.8 KB