Skip navigation
2015

Innovation Viz

March 2015 Previous month Next month

topo.png

By default Tableau gives you some options to decide when to show the mark label: All, Min/Max, Selected, Highlighted or Line Ends (for the line chart).

 

What if you want to have different options, such as "Max & Line Start" or "Min & Line Ends"?

 

To achieve that you need to create a calculated field to be used in the mark label:

 

IF

     <CONDIDITON>

THEN

     <VALUE TO SHOW>

END

 

The <CONDITION> will depends of what you wanna show:

 

//SHOW MAX VALUE

(WINDOW_MAX(SUM([Sales])) = SUM([Sales]))

 

//SHOW MIN VALUE

(WINDOW_MIN(SUM([Sales])) = SUM([Sales]))

 

//SHOW LINE STARTS

FIRST() = 0

 

//SHOW LINE ENDS

LAST() = 0

 

You can also mix those conditions.

For instance, you wanna show the Max & Line Starts as we said before:

IF

     //SHOW MAX VALUE

     (WINDOW_MAX(SUM([Sales])) = SUM([Sales]))

 

     //SHOW LINE STARTS

     OR FIRST() = 0

 

THEN

     SUM([Sales])

END

2.png

 

You can also use that logic to color your chart. Just need to change the calculation to boolean.

For instance, you wanna color with orange the Min & Line Ends:


     //SHOW MAX VALUE

     (WINDOW_MIN(SUM([Sales])) = SUM([Sales]))

 

     //SHOW LINE ENDS

     OR LAST() = 0

3.png

 

You may use parameters to add interactivity (click on the image to be directed to Tableau Public):

http://public.tableausoftware.com/static/images/En/EnhanceYourChart/EnhanceYourChart/1.png


Visit my personal blog Data Visualization & Preparation tips.

title.png

Why to use the Aggregation option?

 

If you decide to extract the data instead of use live connection, there is an important option, called Aggregation, which can save disk space, reduce extract time and improve your visualization performance.

Edit.png

When you use Aggregation option, you are telling Tableau to perform an aggregated extract, which means that all the measures will be summarized (avg, min, max, depending of what you choose as default aggregation) by the visibles dimensions.

 

In a simple way, if you do not choose to use aggregation, this will be the query used to extract the data:

SELECT

DIMENSION_1,

DIMENSION_2,

...DIMENSION_N,

MEASURE_1,

MEASURE_2,

...MEASURE_N

FROM TABLE

 

 

But, if you choose to use the aggregation, this will be the query:

SELECT

DIMENSION_1,

DIMENSION_2,

...DIMENSION_N,

SUM|MAX|MIN|AVG(MEASURE_1),

SUM|MAX|MIN|AVG(MEASURE_2),

...SUM|MAX|MIN|AVG(MEASURE_N)

FROM TABLE

GROUP BY DIMENSION1, DIMENSION_2, ...DIMENSION_N

 

 

Example

We will extract the below table (sales table), which has 31 records and 2 fields: Employee and Sales.

Each employee may have more than one record.

Sales Table.png

Our extract will need both fields, and we will perform the extract with and without the aggregation option.

Extract 1.png


#1 - Not using the aggregation option.

Extract without.png

The process extracted 31 rows (all the table rows).

Query used:

{....

"cols":3,"query":"SELECT 1 AS \"number of records\",\n  \"sales\".\"employee\" AS \"employee\",\n  \"sales\".\"sales\" AS \"sales\"\nFROM \"financial\".\"sales\" \"sales\"","rows":31,"elapsed":0.339}

}

 

#2 - Using the aggregation option

Extract with.png

The process extracted only 16 rows.

Query used:

{...

"cols":3,"query":"SELECT \"sales\".\"employee\" AS \"employee\",\n  SUM(CAST(1 AS BIGINT)) AS \"number of records\",\n  SUM(\"sales\".\"sales\") AS \"sales\"\nFROM \"financial\".\"sales\" \"sales\"\nGROUP BY 1","rows":16,"elapsed":0.149}

}

 

In that example we used a small table with only 31 records.

Probably, in the real world, your tables may have millions or even billions of rows. If that is your case, you should consider to use that option.

 

But be aware: when you aggregate your measures, you will not be able to create calculations in the most granular level of data anymore.

In other words, your row level calculation will be different.


For example, if you have an extract without aggregation you can perform both calculations:

[Sales]/[Profit] and SUM([Sales])/SUM([Profit]) - which will give you different results.

 

When you use an aggregated extract, you will no longer have the granular data, so didactically, you will no longer be able to perfom [Sales]/[Profit], just SUM([Sales])/SUM([Profit]).

 

For more information visit KB Aggregated Extracts.


Also, visit my personal blog Data Visualization & Preparation tips.

query workflow new.jpg

In the post LOD Expressions #1 - Over (partition by)  x LOD we talked a little bit about LOD Expression and how it compares to over (partition by...).

As we said in that post, LOD expressions allows you to achieve similar results than over (partition by...).


Now we are going to understand the reason why we said "similar". In other words, we will understand the difference between them.


We will use a pretty simple example to make easier to understand.

Many of you might say this simple example can be solved with table calculation and/or data blending (as we did in the post Compare a filtered dimension to all values). And you are right, you can re-build this example with those approaches, but the intention here it to explain what happens in the back-end when you use LOD, so you can can apply on more complex cases.


So, lets start.
In our database, we have a Sales table (financial.sales_region):

Tables.png
We want to compare the sum of sales per employee with the sum of sales of the region.

This is our desired output:

Desired output.png

If your approach is to write a query using olap function, you can use this simple query:

select Category, Sub-Category, Sales, sum(sales) over (partition by Category)

from (

    select Category, Sub-Category, sum(sales) as Sales

    from table

    group by Category, Sub-Category

) A


If you are using Tableau 9.0, you can use this calculated field "By Region":

{FIXED [Region]: SUM([Sales])}


By doing that, this is the query that Tableau will create to achieve the desired output:


SELECT "t0"."employee" AS "employee",

  "t3"."__measure__1" AS "sum:calculation_86303112206550",

  "t0"."sum:sales:ok" AS "sum:sales:ok"

FROM (

  SELECT "sales_region"."employee" AS "employee",

    SUM("sales_region"."sales") AS "sum:sales:ok"

  FROM "financial"."sales_region" "sales_region"

  GROUP BY 1

) "t0"

  INNER JOIN (

  SELECT SUM("t2"."__measure__0") AS "__measure__1",

    "t1"."employee" AS "employee"

  FROM (

    SELECT "sales_region"."region" AS "region",

      "sales_region"."employee" AS "employee"

    FROM "financial"."sales_region" "sales_region"

    GROUP BY 1,

      2

  ) "t1"

    INNER JOIN (

    SELECT SUM("sales_region"."sales") AS "__measure__0",

      "sales_region"."region" AS "region"

    FROM "financial"."sales_region" "sales_region"

    GROUP BY 2

  ) "t2" ON (("t1"."region" = "t2"."region") OR (("t1"."region" IS NULL) AND ("t2"."region" IS NULL)))

  GROUP BY 2

) "t3" ON (("t0"."employee" = "t3"."employee") OR (("t0"."employee" IS NULL) AND ("t3"."employee" IS NULL)))

query workflow.png

As you can see, they have different technical solution.


With Tableau's approach you can interact direct with those sub-queries, which gives you some advantages.
For example, you want to exclude the Employee A from your analysis.

To do that, you will use a quick filter.

filter.png


When you add a filter, you can whether or not use it as a context filter.

Depending on which option you chose, you will interact with specifics sub-queries.


Not using as context filter

If you did not add the filter to context, Tableau will exclude Employee A from your output, but will keep it on the calculation of the Region.

Which means that Employee A will still be part of the "Sum os Sales by Region".

 

This is the query that Tableau will perform:

SELECT "t0"."employee" AS "employee",

  "t3"."__measure__1" AS "sum:calculation_86303112206550",

  "t0"."sum:sales:ok" AS "sum:sales:ok"

FROM (

  SELECT "sales_region"."employee" AS "employee",

    SUM("sales_region"."sales") AS "sum:sales:ok"

  FROM "financial"."sales_region" "sales_region"

  WHERE (NOT ("sales_region"."employee" = 'A'))

  GROUP BY 1

) "t0"

  INNER JOIN (

  SELECT SUM("t2"."__measure__0") AS "__measure__1",

    "t1"."employee" AS "employee"

  FROM (

    SELECT "sales_region"."region" AS "region",

      "sales_region"."employee" AS "employee"

    FROM "financial"."sales_region" "sales_region"

    WHERE (NOT ("sales_region"."employee" = 'A'))

    GROUP BY 1,

      2

  ) "t1"

    INNER JOIN (

    SELECT SUM("sales_region"."sales") AS "__measure__0",

      "sales_region"."region" AS "region"

    FROM "financial"."sales_region" "sales_region"

    GROUP BY 2

  ) "t2" ON (("t1"."region" = "t2"."region") OR (("t1"."region" IS NULL) AND ("t2"."region" IS NULL)))

  GROUP BY 2


) "t3" ON (("t0"."employee" = "t3"."employee") OR (("t0"."employee" IS NULL) AND ("t3"."employee" IS NULL)))

query workflow with filter.png
As you can see, the filter was not included in sub-query "t2", because that is the sub-query which generate the sales by region.



Using context filter

If you add the filter to context, Tableau will include the filter "Employee <> 'A'" to your whole analysis, by doing that, the sales by region will no longer include the sales of Employee A.

 

This is the query generated:

SELECT "t0"."employee" AS "employee",

  "t3"."__measure__1" AS "sum:calculation_86303112206550",

  "t0"."sum:sales:ok" AS "sum:sales:ok"

FROM (

  SELECT "sales_region"."employee" AS "employee",

    SUM("sales_region"."sales") AS "sum:sales:ok"

  FROM "financial"."sales_region" "sales_region"

  WHERE (NOT ("sales_region"."employee" = 'A'))

  GROUP BY 1

) "t0"

  INNER JOIN (

  SELECT SUM("t2"."__measure__0") AS "__measure__1",

    "t1"."employee" AS "employee"

  FROM (

    SELECT "sales_region"."employee" AS "employee",

      "sales_region"."region" AS "region"

    FROM "financial"."sales_region" "sales_region"

    WHERE (NOT ("sales_region"."employee" = 'A'))

    GROUP BY 2,

      1

  ) "t1"

    INNER JOIN (

    SELECT "sales_region"."region" AS "region",

      SUM("sales_region"."sales") AS "__measure__0"

    FROM "financial"."sales_region" "sales_region"

    WHERE (NOT ("sales_region"."employee" = 'A'))

    GROUP BY 1

  ) "t2" ON (("t1"."region" = "t2"."region") OR (("t1"."region" IS NULL) AND ("t2"."region" IS NULL)))

  GROUP BY 2


) "t3" ON (("t0"."employee" = "t3"."employee") OR (("t0"."employee" IS NULL) AND ("t3"."employee" IS NULL)))

query workflow with context filter.png

Now, all the sub-queries have the filter "Employee <> 'A'".


Visit my personal blog Data Visualization & Preparation tips.

Americo Guazzelli

Data Blending #1

Posted by Americo Guazzelli Mar 9, 2015

relationship new.png

What happens when you blend a local file with your database?


When you blend a local file with a table in your database you must to consider that:

+  You will need to have both data at the same granular level of the relationship (at least).

+  Both data must to be in the same environment to perform the data blending.

 

It means that even if your viz shows a consolidated information, you may be blending detailed data.

 

For example, you have a sales table (database) and a region information (excel file).

Sales (database)
sales.jpg

Region (excel file)

region.jpg

 

You want to summarize the sales per region, so you have to blend both datasources by employee.


Relationship

Your desired output is:

So, you drag &  drop [Region] (from your local file) to Rows shelf, and [Sales] (from your database table) to Columns shelf.

At this point, you are asking Tableau to retrieve two fields from your database: employee and sales.

So, your viz shows sales per region, but you had to retrieve to your computer (or Tableau Server) the data from your database at the employee level, because (as we said before):

+ You need that field (employee) to blend with your local file, even though you do not explicitly use that field in your viz;

+ Both data must to be in the same environment to perform the blend.

 

Now, Imagine if you have a table with millions of rows and you want to blend with your local file to see the sales by region. You will have to retrieve millions of rows from your database, move them to your computer and blend them with your local file.

If that is your case, be careful.

 

If you are a SAS user, that will be the same behavior when you join a dataset (.sas7bdat) with a table in a database.

 

Visit my personal blog Data Visualization & Preparation tips.

Analysing a Employee 2 New.jpg
How to use Tableau with your hierarchical data


Let's imagine you own a company.

In your company, you have many sales person, and each sales person may (or may not) be the boss of other sales person.

A boss can have other boss, and so on.

Anyone of your company can sell: a sales person, the boss or even the boss's boss.


You want to analyze the sales per sales person.

Not only the individual sales, but include the sales of the hierarchy.

E.g: B is the boss of F, and F is the boss of P.

When you see the B sales, you want to have summarized the sales of B, F and P

When you see F sales, you want to have summarized the sales of F and P

When you see P sales, you want to have only the sales of P, because P is not a boss.


This is your desired output:
Desired output.jpg

In your database, you have 2 tables:

Sales

Which has the sales per employee


Boss

Which indicate who is boss of who.


Sample records:

Sales table

Employee IDSales
J59
F122
J55
F140
J72
J24
B150
B160
I67
K138
K88
K69
P95


Boss table

 

EmployeeBoss
FB
II
JI
KJ
PF



How do you achieve your desired output with those tables?

You will need to use recursion in your database.

What does that means? Well, that means you need to create a recursive view in your database, which will have a loop inside.

And then you can use Tableau (connected to your recursive view) to helps you to create wonderful visualizations.


You can use this approach in many databases (Teradata, IBM DB2, etc...).

In this example, we will use Teradata sintax.


So, lets start.

 

Step 1 - create the recursive view

 

CREATE RECURSIVE VIEW boss_view (employee, seller) AS

(

     SELECT
          main.employee, main.employee as Seller

     FROM

         boss AS main

UNION ALL

     SELECT

          indirect.employee, direct.seller

     FROM
          boss_view  AS direct,  boss AS indirect
     WHERE

          direct.employee = indirect.boss

     AND

          indirect.employee <> indirect.boss

);

But, attention: to use the recursive view, you must to have a breakpoint, otherwise your view will perform an infinity loop. In this case, the loop stops when boss.boss = boss.employee.

 

Step 2 - Mapping the tables into Tableau

Connect to your database and join your view (boss_view) and your sales table, using employee.

join.png

 

Now you can use Tableau to your analysis, such as:


Who is my best seller?

Desired output.jpg

Comparing F to others Employee by Seller Boss (A and B):

Analysing a Employee.jpg

Employee "F" appears in Seller A and B, because F belongs to B, and B belongs to A.

Also, the Seller includes it self. You can note that for the Seller B, it also shows B as employee.




Visit the follow links for more details about recursive views:
http://www.ibm.com/developerworks/ibmi/library/i-db2connectby/

http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1184_111A/Create_Procedure-Details.011.096.html


Also, visit my personal blog Data Visualization & Preparation tips.

Sem título new.png

How to use the Grand Total to calculate the difference between 2 periods

 

As most of you know, Row Grand Totals allows us to see the Average, Sum, Minimum and Maximum by row.

 

But, what happens when we want to see the difference between two periods?

For example, you want to compare the Sales of 2013 x 2012 (or any other year).

Your desire output is a table (or a bar chart) with 3 columns: 2012, 2013 and the difference:

output.png

It seemed so easy to do... so I started to try to build the red column with Tableau.

After a couple hours, I did not find any solution to achieve that output.

Even the famous table calculations (like "Difference") did not fit my requeriments.

 

So, I went to the Community Forums to looking for a solution.

What a found (I hope I did not miss any simpler solution) were many ways to achieve those values, but not the layout that I wanted.

By the way, I suggest you to see this thread (which received many posts with different solutions) How to hide blank columns created by table calculations.

 

Should I give up and use the workarounds or maybe change the layout to fit the solutions I found?

No! There is always a way (or almost always).

 

I decided to put away all the crazy solutions that I was trying to develop and spend sometime thinking what I was missing...

After a while....That's it! .. much simpler than I imagined.

 

The solution that I figured out is relative simple:

you will need two parameters (one for the first year and other for the second year)

Two calculated fields: one to filter the years and other to calculate the Sales

+  Mark "Show Row Grand Totals" (Analysis -> Totals)


Calculated filed # 1 - Filter (Use this field to filter the "true" values)

(year([Order Date]) = [Year 1] or year([Order Date]) = [Year 2]) and [Year 1] <> [Year 2]

 

Calculated Filed # 2 - How to calculate the Sales

if last()=first() then

//GRAND TOTAL

sum( if year([Order Date]) =  [Year 2] then ([Sales]) else ([Sales])*-1 end)
else sum([Sales])

end

 

http://public.tableausoftware.com/static/images/Gr/GrandTotalDifference/Sheet2/1.png

 

Visit my personal blog Data Visualization & Preparation tips.

Sheet 16 new - Cópia.jpg

OLAP function

 

If you are familiar with SQL and had to solve complex analysis with a single query, you've probably used the OLAP function "OVER (PARTITION BY)".

Basically, this function allows you to perform an aggregation in a different level of the view.

 

For example, you have this table:

Table.jpg

 

And you desire this output (where "New LOD" is the sum of Sales by Category):

Output.jpg

 

You might use role playing table, sub-queries or any other way that allows you to create the desired output.

 

But, if you want something easier and simpler to understand?

In that case you may be looking for OLAP functions.

By adding "over (partition by <field 1>, <field 2>,...<field N>)" you can choose the fields used to aggregate.

 

See the query below:

select Category, Sub-Category, Sales, sum(sales) over (partition by Category)
from (
    select Category, Sub-Category, sum(sales) as Sales
    from table
    group by Category, Sub-Category
) A

 

In this query you are telling the database that for the second aggregation "sum(Sales) over (partition by Category)", you want to sum the Sales by Category.

In other words, you are saying that, for that particullary aggregation, you want the database to peform this:

select sum(Sales), Category
from table
group by Category

 

Nice, eh?

 

 

What about Tableau?

Before Tableau 9, to achieve that result you would have to use joins, data blending, sets or maybe table calculations (Compare a filtered dimension to all values), but all these require much more work and they have their limitation.

 

In Tableau 9, the new capability Level of Detail Expression (LOD Expressions), allows you to perform calculations which can gives you similar results than "over (partition by...)" - and much more!

 

So, to achieve our desired output in Tableau 9 you can create a simple calculated field:

Option 1: { EXCLUDE [Sub-Category]: SUM([Sales]) }

Option 2: { FIXED [Category]: SUM([Sales]) }

 

You can also use quick filter in or not in context (see attached Tableau 9 workbook).

 

The intention of this post is not to explain how the calculations works.

 

For more details about LOD Expressions visit

Tableau - Introduction to Level of Detail Expressions

VizPainter - My Favorite Tableau 9.0 Feature

The Last Data Blender - Understanding Level of Detail Expressions – Part 1


Also, visit my personal blog Data Visualization & Preparation tips.

teste2 new.jpg

How to compare a Region sales to the overall sales?

 

Situation:  You want to have a line chart which you can choose one Region and compare to the sum of all regions, across year and customer segment.

How do you achieve the desired result?

 

I guess there are a lot of ways to do that, but today we are going to use the duplicate datasource approach.

In the follow example, we are going to use Superstore data set.

So, lets start.

 

Step 1 - Duplicate your data source

1 Duplicate.png

 

Step 2 - Rename the duplicated datasource (optional)

2 Rename.png

 

Step 3 - Edit relationships

In our case, we want to see a line per customer segment across the year, filtering a region (on the primary datasource) and comparing to overall regions.

So, we will blend by Order Date and Customer Segment.

4 Edit Relatioships.png

 

Step 4 - Create your viz

We are almost done. Now we will create our viz.

From SuperStore datasource:

4.a Drag&Drop "Customer Segment" to Colors

4.b Drag&Drop Order Date to Columns

4.c Drag&Drop Sales to Rows

 

You will have something like this:

5 Viz.png

 

Step 6 - Include the overall Sales

Now we will include the overall Sales per Customer Segment per Year.

From the Overall datasource:

6.a Make sure the links are in use

6.b Drag&Drop Sales to Rows

6 Finish.png

 

Step 7 - Finish

7.a Change the secondary Sales to Dual Axis and then Synchronize the Axis.

7 Duas Axis.png

7.b Change the size of the secondary Sales

7 Size.png

7.c Add transparency to the secondary Sales

7 Color.png

7.d remove "Measuare Names" from Color.

7 Remove Measure Names.png

 

Now you can filter any field from the primary datasource and compare to the overall.

http://public.tableausoftware.com/static/images/Di/DimensionXOverall/Compare/1.png

 

If you are using Tableau 9, there is a much simpler way to achieve that result, by using LOD. Visit OVER (PARTITION BY)

 

Also, visit my personal blog Data Visualization & Preparation tips.

Americo Guazzelli

DATEDIFF Semester

Posted by Americo Guazzelli Mar 3, 2015

Sheet 17 new (1).jpg

How to calculate the semester difference between 2 dates

In other words, how to simulate the date_part 'semester' in the DATEDIFF() function.

 

There are many ways to reach that.

In this blog, I'll explain 2 of them.

 

# 1 - Using DATEDIFF

 

All you need to to is :

     +  use the DATEDIFF function to get the quarter difference between the two dates

     + Divide the result by 2

     + get the int part of the division.

 

Exemple:

INT(DATEDIFF('quarter',start date,end date)/2)

 

# 2 - Using date calculation (for the SQL maniacs...)

You will need the year and month of both dates:

 

 

(YEAR(end date) - year(start date) ) *2

+

(

     (IF MONTH(end date) >= 7 THEN 2 ELSE 1 END)

     -

     (IF MONTH(start date) >= 7 THEN 2 ELSE 1 END)

)


Visit my personal blog Data Visualization & Preparation tips.

Sheet 17 new.jpg

A good way to color label values per line

By default, you can choose one color for all labels:

http://public.tableausoftware.com/static/images/Co/Colorlabelvaluesperline/Default/1.png


But if you wish to color the labels values with the same color than the line, you have to do some extra steps:

1. Create a line chart, and set the color line.

2. Duplicate the row measure.

3. For the second axis, change the mark to 'text'.

4. Choose 'Dual Axis'

 

http://public.tableausoftware.com/static/images/Co/Colorlabelvaluesperline/Colorlabelperline/1.png

 

Visit my personal blog Data Visualization & Preparation tips.