Basic Concepts for Tableau Prep

Version 3

    This guide is intended to provide an overview of some of the concepts related to data prep operations, corresponding to the major types of Steps that can be created in a Tableau Prep flow (.tfl).

     

     

    Joins

     

    Used when there is a need to combine two or more tables for analysis, that can be related to one another.  The columns on each side of the join need to be of the same data type (string, integer, etc); and in most join situations, there will need to be at least some matching row-level values in these columns.  See the blog post A Visual Explanation of SQL Joins for more information on the different types of joins.

     

    At a minimum, the result of a join will be a data set that is "wider" (in other words, containing more columns) than each of the original tables:

     

    joins.jpg

    (Image Credit - Essential SQL: What is the difference between a Join and a Union)

     

    Depending on the specifics of the tables and the join condition, it is also common for the result to contain more rows (or, be “taller”) than the original tables, due to row duplication.   Our Product Help provides a simple example of how this occurs.

     

     

    Unions

     

    Used when there is a need to analyze data from two or more tables containing identical columns, and the matching columns need to be treated as the same field rather than disparate ones.

     

    It is important to understand the distinction between Unions and Joins, because occasionally Joins will be attempted in scenarios where a Union would be more appropriate.

     

    Instead of creating a result set that will be wider, a Union will contain the same number of columns, but will be taller (or, contain more rows) than the original tables:

     

    unions.jpg

     

    (Image Credit - Essential SQL: What is the difference between a Join and a Union)

     

    As with a join, each corresponding column being unioned needs to be of the same data type.

     

    Additional Information: What is the difference between a Join and a Union

     

    Pivots

     

    In general, used when there is a need to transform multiple columns within a table into rows, or vice versa. Tableau Prep supports pivoting columns to rows via a Pivot Step, however this functionality also exists in Tableau Desktop on the Data Source tab.

     

    Example: Data contains sales values by year:

     

    pre-pivot.jpg

     

    Pivoting the columns for 2014, 2015, and 2016 into Rows will convert those three column headers into row-level values within the first new column; the original row-level values for the three columns will be placed into a second new column.  After performing the operation and renaming the new columns, we have the following result:

     

    post pivot.jpg

     

    Additional Information: Pivoting Data in SQL

     

    Aggregations

     

    Used when there is a need to adjust a table to a less-granular level of detail. This operation might need to be performed before doing a join or a union with another table.

     

    Example: This data set has the field "Row ID" as its highest level of granularity, corresponding to a single row ID for each individual Product within an individual Order ID:

     

    aggregation 1.jpg

     

    One way to aggregate this data might be to bring it to the less-granular level of Order ID / Order Date:

     

    aggregation 2.jpg

     

    Notice that this required removing the Row ID and Product Name fields, because they contribute to the original level of granularity.