Name: Distinct Count
Calculates the number of distinct values of a field within a partition. This is a table calculation equivalent of the COUNTD() database function, which may be required either because the result needs to be used in a calculation at a different level of detail to the level at which it is calculated, or because the datasouce does not support the COUNTD() function (i.e. if the datasource is the Microsoft JET engine).
WINDOW_SUM(IIF(ATTR(A)==ZN(LOOKUP(ATTR(A), -1)), 0, 1))
Compute Using: Ordered by A ascending.
Inputs and Setup:
A: The field for which to count distinct values.
WINDOW_SUM(IIF(ATTR([CustomerID])==ZN(LOOKUP(ATTR([CustomerID]), -1)), 0, 1))
Works by adding 1 for each value of A for which the value of A in the previous row (sorted by A) was different (or undefined, in the case of the first row).
There are a couple of examples of this calculation in this forum thread.
Note that currently (as at 6.0.4) the performance of the window functions such as WINDOW_SUM() degrades badly for large partitions, due to an unnecessary sort of the window for each row of the partition.
In cases such as this where the window function is evaluated over the entire partition, returning the same value for every row in the partition, there is a workaround to avoid the performance issue with large partitions.
The workaround is to evaluate the result for the first row of the partition and then copy the result for every other row in the partition. There is no way to prevent the window functions from performing the sort for the other rows, but as the result will be ignored for all other rows, the window size can be forced to a single row, to minimise the overhead. Here is the optimised version:
PREVIOUS_VALUE(WINDOW_SUM(IIF(ATTR(A)==ZN(LOOKUP(ATTR(A), -1)), 0, 1), 0, IIF(FIRST()==0, LAST(), 0)))
Note that if the database engine supports the COUNTD() function, an alternative approach to evaluating a distinct count, avoiding the use of window functions but still allowing the result to be used in a calculation at a different level of detail, is to use data blending to connect to a copy of the datasource, linked at the relevant level of aggregation and using the COUNTD() function.