Migration to Douglas County, Nebraska

Tableau Public workbook of migration data available here!

 

In the last few weeks, I’ve been making a lot of origin-destination maps in Tableau. These maps may also be referred to as flow maps, path maps, spider maps, or just drawing lines between points on a map. No matter what you want to call this type of map, they are a great way to see spatial patterns due to change in location or to explore connectivity between places.  For instance, to map storm tracks, migration between counties, or to show the connection between your office location and the location all of your business contacts.

 

In my attempts to make these maps, however, I have experienced many instances where they didn’t behave like I expected.  Since I was struggling a bit, I started experimenting to learn more about how they work and to figure out the secrets to making (and breaking) them.  If you’re reading this, I’m guessing you’ve also had some problems with these maps and want to know all the secret tricks. Hopefully I’ve covered what you need to know for your map, but if not, drop a comment at the end of this post and I’ll see if I can be helpful in finding a solution.  A workbook with all of the examples is up on Tableau Public.

 

I’ll start this post with the take-home message about good practice for making any origin-destination map in Tableau, and will then systematically break my map in many different ways to explain why they work the way that they do.

 

Making an origin-destination map the right way

The short story is that for any origin-destination map (whether using latitude and longitude from your data source or generated latitude and longitude from Tableau geocoding polygons), you need four things:

  • The data points – if you’re using Tableau geocoding, make sure that all points are fully disambiguated before trying to make your map (e.g., if you are using cities, make sure you also have a state field that disambiguates the locations)
  • Path IDs – any unique ID for each path so that Tableau knows which points to connect on the same line
  • Line mark type – this tells Tableau that the points on the map should be connected with lines
  • Path order – so that Tableau knows how to connect the points (put this on ‘Path’)

 

If you have these four things – which may be any number of different pills on the Marks card you should end up with a working origin-destination map.  I’ll show a few examples and document what is special about the placement, ordering, and type of data for each of these…

 

Origin-destination map with generated latitude and longitude (from Tableau geocoding)

I find origin-destination mapping with generated latitude and longitude to be more challenging (and that I answer more questions about them on the Tableau forums), so I’m going to use all geocoded data for my examples.  With this type of data, there is an additional challenge to creating the maps – you have to disambiguate the geographic locations while not splitting up the pathways that you want to draw.

 

I’m going to use a simple dataset to make and break a bunch of maps.  The pattern has no real meaning; the data values were just selected to cover common problem cases with this type of map (based on questions that I’ve seen and/or answered in the Tableau Forums).  Three of the problems that I’ve found in origin-destination mapping using Tableau geocoding are:

  • Connecting geographies in the same region (e.g., counties in same state)
  • Connecting geographies in different regions (e.g., counties in different states)
  • Connecting geographies that may be ambiguous (e.g., same name in different states)

 

Here is a map that works as expected – it may look like a strange pattern, but it’s a special dataset that I created specifically to cause problems (we’ll explore those problems in just a bit when we start breaking this map in many ways):

 

 

Let’s take a look at the parts of this map to learn about how and why it works.  The magic is all on the Marks card:

 

 

  • The data points – these are defined by the State and County pills.  Together these two fields disambiguate every location in our map and tell Tableau where to draw the points on our paths. I'll explain why they are both Min() in the next little bit of the post - it's important enough that I put it in bold below.
  • Path IDs – this defines the individual paths that we want on the map.  It helps tell Tableau which points to connect together
  • Line mark type – Tableau, go forth and draw the lines!
  • Point order – the order in which Tableau connects the points (note that this field is on Path). Whether your line has two or twenty points, you should always use a dimension to explicitly specify the order

 

The four parts of the map listed above are the basics, but there is a bit of a trick with the generated latitude and longitude from Tableau geocoding.  We need enough of the geographic hierarchy to disambiguate all of the points on our map (in this case County and State), but these geographic fields are Dimensions and that means that they are going to be used to partition the data for the viz.  If we partition the data into small enough groups that we don’t have at least two points for each path then no line will be drawn.  Pesky problem – if you don’t have at least two points, you don’t have a line.

 

So, we use a trick of using Min() on each of them so that we can get State and County on the viz to disambiguate, but make it so that they aren’t used to partition the data.  Min() values are treated as Measures instead of Dimensions. 

 

If that all makes perfect sense to you, read no farther.  But, if you want to some graphics to demonstrate why it works – and what happens if you don’t have your pills quite right, keep on reading for fun examples and descriptions of broken path maps!

 

For each of the next maps I’m going to show the Marks card so that it’s clear what is on the viz, a table that I’ve made to show how the data is being partitioned to draw on the viz, and the resulting map.   The table for each map shows the same fields as are used on the Marks card, in the same order, and has a table calculation indicating the running count of the number of points in each partition.  In order to draw a line, the count has to get to at least TWO for any path in order for a line to be drawn (that whole crazy idea that a line is the connection between two points…)

 

Now, let’s make and break some maps!

 

Broken Map #1:

We’ll start with the method that I frequently see when I ask people to define paths using this or similar datasets.  We need State and County to draw the points using Tableau geocoding, and Path ID to define the paths.  Then we switch to the Line mark type and it should all work, right? Nope (though, sometimes if your data is juuuuusssssttttttt right and there aren't any ambiguities it will actually work like this).  

 

 

We end up with one path and it only connects the two points that happen to be in the same state (Los Angeles County, CA to San Bernardino County, CA).  This happens because Tableau will group the data based on every dimension in the Marks card.  So, we group by Path ID, then by State, then by County – and the result is that only one Path ID has more than one point in the final group, and most paths have multiple final groups.  For example, Path #2 is broken into two distinct groups with only one point each – Los Angeles County, CA and Clark Count, NV.  Since we can’t draw a line with more than one point, we only see these locations as points on the map.

 

 

Broken Map #2

Now let’s see what happens if we change one of the dimensions to a measure.  We do this by using Min(dimension).  In this first example we’ll use Min(State).  This makes it so that the Paths aren’t broken up by the State dimension; measures aren’t used for grouping.  When we do this, we get a whole lot of additional lines showing up on the map.

Why does this happen? Even though there are multiple states in most of the groups, the County attribute disambiguates them sufficiently when combined with min(State). The only group where this fails is when we try to connect all of the Jefferson Counties throughout the US. This doesn’t work in this case because the county name is ambiguous – so when the Min(State) is calculated for the group every Jefferson County is assigned to Alabama…because Alabama is alphabetically the first state in the list of states with a Jefferson County.  Once that Min(State) is assigned to each row, then they all aggregate up together and we only have one point remaining – so we have a point on the map, but no line.  If you changed the Min(State) to Max(State) the point would be located in West Virginia (the last state with a Jefferson County).

 

Broken Map #3

 

What if we make both State and County measures by using Min(dimension)?  We end up paring down the dataset so that the only grouping dimension is Path ID, and each unique Path ID gets the alphabetically first State and County value.  That gives us a super-strange origin-destination map connecting Los Angeles County, CA and Jefferson County, AL - which is a hybrid of Path #5 and Path #7. Weird.

What’s going on?  This was one of the most fun for me to figure out… there are actually FOUR paths stacked on top of one another all leading between Los Angeles, CA and Jefferson, AL.  Why? Tableau is calculating the Min(State) and Min(County) separately for each path, and then drawing a line between the counties that are left.

 

So we see paths between the Min(State) and Min(County) pairs in PathID 1 and 5, 3 and 5, 5 and 6, and 5 and 8!

Broken Map #4

But, those examples all violated my first rule where we define four things for every origin-destination map – the data points (our State and County pills in these examples), the path IDs (Path ID), the line mark type (the drop down for viz type), and point order.  Let’s add in Point Order and see what happens to our vizes, since it’s good practice to explicitly tell Tableau what order to connect the points (and it’s the best way to make sure that your map actually draws properly).

 

If we’re using the Point Order on Path, as we should - because it’s the right way to viz! – what happens if State and County are both dimensions?  We end up splitting our data into tiny little groups and each group has only one point in it.  Therefore we have no lines.

 

 

Broken Map #5

How about if we start changing our geographic dimensions to measures by using Min(dimension)?  If we just change State to a measure, we group by Path ID and County and end up with only the Jefferson County line being drawn.

 

 

Why did this happen? First we group by ID, so we have all of the separate paths.  Then we group by County name – and the only group that has more than one of any county is Group 5, the Jefferson county path.  Since the other groups have all unique county names, they are split into multiple groups of one county…and one county does not make for a path.

 

Broken Map #6

Just for fun, what happens if Point Order is just on Detail and not on Path? When you have Point Order on Path, Tableau connects the points in a specified order instead of just from smallest longitude to largest longitude (yes, that is the default if it isn’t made explicitly clear how to connect the points together).  The fun part of that knowledge is that it makes it possible to create ‘scribble maps,’ otherwise just file it in the ‘nice to know’ pile.

 

 

Working Map!

And, what if we add Min(County) as well?  Everything works just like we had planned – because we just group the points by Path ID and then draw them according to point order. Magic.

 

And that’s what I’ve learned making and breaking a lot of maps in Tableau.  Hopefully it’s helpful for your projects with origin-destination mapping. If not, let me know what you are breaking in your map and I’ll see if I can come up with thoughts on how to fix it.

 

Additional resources:

There are a few great Tableau content articles on how to make origin destination maps when you have latitude and longitude values in your data source: