In this post, we will show how to build a decision tree with Tableau. The goal, as usual, is to do it with a minimum of data preparation. For this example, we will use the superstore dataset provided with the Tableau installation. We would like to build the following tree:
- Level 0: Starting point
- Level 1: Order Priority
- Level 2: Ship Mode
- Level 3: Product Container
Important: in order to go through this post, you may also read the following post : Sankey diagram made of dynamically generated polygons
It will look like this :
In the attached workbook, we have made a lot of interactivity. We can choose :
- The dimension to use in Level 1, 2 and 3
- Data is filtered by Year
- The flow size can be defined chosing an indicator
- The color of the flow can be allocated to another indicator
- Tooltip will show the detailed information.
In this post, we are not going to detail the whole step-by-step prcedure but rather explain the logic of calculation in order to build the Viz.
Data Preparation :
Let’s first build the dataset. For this purpose, we will use the superstore Dataset that we will blend with the polygonic dataset (we will make it polygonic !). The result file is attached. We removed all the data from it that is not required and added the « Link » column in the superstore data table.
Understanding the logic :
Let's first build it into Powerpoint to better understand the logic :
Basically what we did here is to build a visualization of a hierachy. Let's decompose this flow chart into different parts and let's add axes :
If we want to build it into Tableau, we have to find out how to position the dimension at the different levels in the viz ie. (x,y) coordinates. Let's decompose it in 3 steps A, B and C.
Levels will define the x positions of the hierarchy details, while Y is defined by Position 0 to Position 3, where the Position is defined by a ranking of the Level details in order to show the appropriate flows.
In our method, X will be defined by our polygonic model and will build the curves for each part A, B and C.
Basically, the level 3 should be a ranking of each individual product container for a certain ship mode and a certain order priority. We could represent it in this way :
The issue is that Tableau can easily represent it in a table or a Tree Map but we have to cheat a bit in order to show it in a Graph that looks like a decision tree.
Problem : We have to define how to sort the dimensions at the different levels in order to draw our curves.
Solution : INDEX and Advanced Table Calculations !
Index will give us the ranking, and advanced table calculations, the level at which curves should be grouped.
To better understand, let's build 4 indicators :
Index 0 = INDEX() computed along Level 0, Level 1, Level 2, Level 3 at the level « Level 0 ».
The Level 0 will be the starting point the whole data set. We will just create a Calculated Field "Level 0" that equals to 'Start'. This enable us to build a viz starting from a single point, whatever the other levels are.
To do so, create a new calculated field :
- Name it Index 0
- Function : INDEX()
- Click on « Default Table Calculation » on the upper right hand side of the calculated field window. In « Compute using » choose « advanced… » and build the "adressing" this way: Level 0 > Level 1 > Level 2 > Level 3 (be sure to keep this order !)
Click « OK » and choose Level 0 in the « At the level » drop down menu. Click OK again.
Repeat this operation for Index 1, 2 and 3. The only difference is that we will define the calculation respectively at levels 1, 2 and 3.
When building a table with this information, you should find the following :
We have defined the right ranking of our elements at each level of the hierarchy.
Now we would like to build a distribution of these points in order to define our Y’s in our viz. This should be calibrated to fit the same range of data, so that curves split instead of creating a waterfall. Easiest is to set the position of a range from 0 to 1.
Let's then define our positions :
Position N :
Computed along Level 0, Level 1, Level 2, Level 3 at the Level N
- INDEX will give us the ranking;
- SIZE will give the total number of unique Index in the partition;
- +1 in order to have a good distribution of our points in a 0 to 1 range.
In our above example :
- Hierarchy : Critical / Delivery Truck / Jumbo Drum
- Index at Level 3 is 2.
- Position 3 will be 2/(12+1) = 2/13
We can then build our Curves A, B and C and the final Viz (either in a dashboard or in a single workbook).
Note : In Tableau, using Table Calc sometimes require that you update pills in the viz. If you don’t get the right viz, simply drag and replace indicators in the viz to update the view.
Have it polygonic !
To build a decision tree using polygons, we require a bit more work and a different method than the polygonic Sankey to define the levels :
- Size : SUM([Choose Indicator]) / TOTAL(SUM([Choose Indicator]))
Computed using Level 0, Level 1, Level 2 and Level 3
- Position N Max : [Position N] + RUNNING_SUM([Size])
- Position N Min : [Position N Max] - ([Size])
This will sort the data on a 0 to 2 axis
The remainder is exactly the same as for the Sankey Diagram.
Have fun !
Now that you know how to use adressing to customize your viz, just play with it in order to build new flow diagrams !
For any questions, please contact us : Olivier CATHERIN
Ce message a été modifié par : Olivier CATHERIN I just mad a little correction in the formulas used : replacing WINDOW_MAX by SIZE will simplify the calculation and avoid issues when filtering the view.
Ce message a été modifié par : Olivier CATHERIN Hello I just added a new version using LOD in the flow size calculation to show results as percentage of the total flow whatever the selection is. Cheers