8 Replies Latest reply on Jul 30, 2018 10:05 AM by Dhatrish Rajdev

    Count Distinct from Secondary workaround

    Greg Ganzel

      Hi Everyone,

       

      I’ve created this post in order to share my workaround for bringing over a Count Distinct (COUNTD) from a secondary data source.  If you’ve tried to show a COUNTD from a secondary before, you’ve undoubtedly seen the message stating “Cannot blend the secondary data source because one or more fields use an unsupported aggregation”.

       

      There are a few good workarounds for this already (such as https://community.tableau.com/thread/126695 ) but my solution is more versatile as it allows for additional calculations on top of the workaround calculation.

       

      Normally, if you want to display the distinct number of customers [Customer ID] that made a purchase, you would use this calculation:

       

      COUNTD( [Customer ID] )

       

      However, you can write essentially the same calculation in a different syntax using the Level of Detail (LOD) calculation INCLUDE:

       

      SUM( {INCLUDE [Customer ID] : MAX(1) } )

       

      This LOD calculation produces a single value of “1” for each unique combination of dimensions used in the “dimensionality declaration”.  When you Sum up an LOD calculation, it adds up the single value produced for each unique combination of the input dimensions.  This is different than a Sum produced from a non-LOD calculation, where each row of data is added together to produce the aggregation.  The result of this LOD calculation should be exactly the same as the COUNTD.

       

      Key Use Cases:

      1. Key Performance Indicator (KPI) Dashboards.  Oftentimes KPI dashboards require blending to a variety of secondary data sources, and then a calculation in the primary data source showing the “Percent of Goal”.
      2. Blending from a polygon-based map to an existing data source which uses 1-to-many joins. While it’s possible to aggregate the 1-to-many data source in the custom SQL, this can be time consuming and will require more edits when data sources change.

       

      Advantages:

      1. Very easy to write and implement.  You can include more dimensions in the declaration, such as a product code or invoice number, without having to modify the blend fields or visualization.
      2. Can create non-LOD calculations in the primary data source referencing the value from the secondary data source.  This is particularly useful for calculating averages per customer and averages per region.

       

      Limitations:

       

      1. This works with INCLUDE, but does not work with FIXED, due to order of operations in Tableau ( http://onlinehelp.tableau.com/current/pro/desktop/en-us/order_of_operations.html )
      2. The blend field(s) in the primary data source MUST be in the visualization ( https://onlinehelp.tableau.com/current/pro/desktop/en-us/help.html#calculations_calculatedfields_lod_overview.html?Highlight=LoD )
      3. You may run into performance problems if you’re dealing with an extremely large secondary data source.  However, I wouldn’t expect performance using this workaround to be significantly worse than using the other available Tableau-based workarounds.

       

      Due to time constraints, I have not provided a packaged workbook demonstrating this calculation in action.  But, I have tested it numerous times using private data, and it always produces the same value as a COUNTD.

       

      Please share your thoughts, questions, or additional use cases.

       

      Take care and Good Luck!