title.png

Why to use the Aggregation option?

 

If you decide to extract the data instead of use live connection, there is an important option, called Aggregation, which can save disk space, reduce extract time and improve your visualization performance.

Edit.png

When you use Aggregation option, you are telling Tableau to perform an aggregated extract, which means that all the measures will be summarized (avg, min, max, depending of what you choose as default aggregation) by the visibles dimensions.

 

In a simple way, if you do not choose to use aggregation, this will be the query used to extract the data:

SELECT

DIMENSION_1,

DIMENSION_2,

...DIMENSION_N,

MEASURE_1,

MEASURE_2,

...MEASURE_N

FROM TABLE

 

 

But, if you choose to use the aggregation, this will be the query:

SELECT

DIMENSION_1,

DIMENSION_2,

...DIMENSION_N,

SUM|MAX|MIN|AVG(MEASURE_1),

SUM|MAX|MIN|AVG(MEASURE_2),

...SUM|MAX|MIN|AVG(MEASURE_N)

FROM TABLE

GROUP BY DIMENSION1, DIMENSION_2, ...DIMENSION_N

 

 

Example

We will extract the below table (sales table), which has 31 records and 2 fields: Employee and Sales.

Each employee may have more than one record.

Sales Table.png

Our extract will need both fields, and we will perform the extract with and without the aggregation option.

Extract 1.png


#1 - Not using the aggregation option.

Extract without.png

The process extracted 31 rows (all the table rows).

Query used:

{....

"cols":3,"query":"SELECT 1 AS \"number of records\",\n  \"sales\".\"employee\" AS \"employee\",\n  \"sales\".\"sales\" AS \"sales\"\nFROM \"financial\".\"sales\" \"sales\"","rows":31,"elapsed":0.339}

}

 

#2 - Using the aggregation option

Extract with.png

The process extracted only 16 rows.

Query used:

{...

"cols":3,"query":"SELECT \"sales\".\"employee\" AS \"employee\",\n  SUM(CAST(1 AS BIGINT)) AS \"number of records\",\n  SUM(\"sales\".\"sales\") AS \"sales\"\nFROM \"financial\".\"sales\" \"sales\"\nGROUP BY 1","rows":16,"elapsed":0.149}

}

 

In that example we used a small table with only 31 records.

Probably, in the real world, your tables may have millions or even billions of rows. If that is your case, you should consider to use that option.

 

But be aware: when you aggregate your measures, you will not be able to create calculations in the most granular level of data anymore.

In other words, your row level calculation will be different.


For example, if you have an extract without aggregation you can perform both calculations:

[Sales]/[Profit] and SUM([Sales])/SUM([Profit]) - which will give you different results.

 

When you use an aggregated extract, you will no longer have the granular data, so didactically, you will no longer be able to perfom [Sales]/[Profit], just SUM([Sales])/SUM([Profit]).

 

For more information visit KB Aggregated Extracts.


Also, visit my personal blog Data Visualization & Preparation tips.