Version 7 filled maps are great, but the one thing you can't do is define your own shapes.
There was a lot of discussion on the version 7 beta forum about how cool it would be to be able to import definitions of shapes while defining custom geocoding. I was particularly interested in that having spent a few years as the technical lead on the project that digitised New Zealand's survey records and put them online - so I was really keen to see what I could visualise with all that data I helped make available.
So when the answer came back from the Tableau folk during the beta that there would definitely be no support for extensibility in version 7, I couldn't resist the challenge. I'm not known for taking no for an answer.
And the answer is that if you're prepared to rummage about under the covers a bit in unsupported territory, it is possible to add shapes to custom geographic roles. There's quite a bit of work to get it going, but with the help of some open source GIS libraries I've managed to automate the whole process and bundle it all up as a tool which makes loading data from a shape file quick and relatively easy (once you've got the hang of it). I've shared it with a couple of people to get a bit of feedback on how usable it is, and I'll happily share it with the wider community once I've tidied up a few of the loose ends the "beta testers" highlighted.
At this point I should stress that this is an unsupported hack, and by unsupported I mean several things:
- If it doesn't work as you expect, there's no guarantee it will ever get fixed. Best endeavours,if I'm interested and not too busy, that sort of thing.
- If you have any problems with a workbook that uses this approach; don’t even think about asking Tableau for help until you remove the custom geocoding. (I have no idea what Tableau’s attitude would be, but I know what mine would be if I were them.)
- It is virtually certain that a future release of Tableau will change how geocoding works in some way that will stop this from working altogether – simply because this approach relies on very specific (and unpublished!) details of the internal structure of the geocoding database. That is bound to change at some point. Hopefully any release that changes things in this way will also include adding support for similar extensibility capabilities, but there’s absolutely no telling.
- It uses an open source GIS library - and at least one of the features I'm using (simplification of complex shapes) doesn't work as well as I'd like - but there's nothing I can do about it.
The implications of all of this are clear: don’t use it for anything which you care about. In particular don’t use it for anything which needs to keep working beyond the next release of Tableau.
Personally, I intend to use it for point-in time, throw-away analysis: blog posts and the like, and also to explore how this sort of capability would be useful if it were a supported part of the product. I strongly suggest you limit your use similarly.
You have been warned.
Here's a sample viz on Public which illustrates what it can do - and also highlights some of the issues to watch out for which I'll discuss below.
Edit: November 2014 - A recent change on Tableau Public has reduced the maximum precision at which maps can be shown - hence the very jagged coastline shown here. The Tableau Public change also briefly broke a lot of filled maps generated with the hack completely, due to a subtle difference in the internal encoding generated by the hack. Which all goes to emphasize the risks of relying on hacks.
This viz shows the Tsunami warning zones for the area around where I live in New Zealand, with red, orange and yellow reflecting the areas affected by progressively larger or more local events. The green area shows the "meshblock" (a collection of land parcels) which contains my house.
What it Does
The utility takes one or more spatial data files containing polygon data, transforms them to an appropriate geographic coordinate reference system for Tableau to use (i.e. to lat/long coordinates) and generates CSV files in the format needed for creating custom geocoding roles. After the custom geocoding has been imported to Tableau the utility is run again to insert the polygon boundary data into the custom geocoding database.
The source spatial data can (in principle) be in any spatial data format supported by the Geographic Data Abstraction Library (GDAL, an open source GIS library). I say in principle because I’ve only tested with ESRI shape files and a handful of other formats, but I see no reason why it shouldn’t work with anything the GDAL utilities can understand (which is an extensive list).
The utility also supports purging of unneeded geographic roles from the resulting custom geocoding database. Reducing the size of the database in this way can improve performance and also reduces the size of any packaged workbooks (which minimises the use of Tableau Public quota when publishing the workbook).
One of the key factors which determines the viability and usability of the resulting geocoding database is the number and complexity of loaded shapes. Too many, or too complex shapes can lead to very poor performance or even an out of memory error - it can simply take Tableau outside the envelope it is designed for.
To help ensure you don't overload it, the utility provides the option to simplify the boundaries of the shapes using the GDAL library and also to display statistics about the complexity which help in deciding the appropriate simplification settings.
However, don’t expect too much. Simplification of spatial data is a notoriously difficult task and can often lead to anomalies and artefacts in the simplified data (such as missing or overlapping “slivers” at the boundaries of adjoining shapes).
For example, the two screenshots below show a sample of New Zealand electoral boundaries simplified with a tolerance of 1,000 metres (left) and 100 metres (right). The original shape file with no simplification results in almost 600,000 boundary points being loaded.
Simplifying at 1,000 metre tolerance reduces that to 3,000 boundary points (a factor of 200), which makes the view much more responsive, but clearly introduces a lot of error. At 100 metres tolerance the number of points is around 16,000 (down by a factor of 40 on the original), which still allows the view to respond quickly whilst also retaining acceptable accuracy.
Finding the best compromise between simplicity (and hence performance) and accuracy can involve a lot of trial and error. Getting satisfactory results may require manual intervention using a GIS tool. It can be particularly difficult if there is a wide range of sizes of shapes in the one file, since the same level of simplification has to apply to the whole file.
The second tab in the viz embedded above shows a few more examples of different levels of simplification, which you can explore interactively. The reason I've been using the Tsunami Zones shape file for a lot of my testing is because the shapes are extremely complex and clearly very difficult to simplify well. It's very hard to simplify long, thin highly detailed shapes like that without breaking them. In fact, in this case, it's probably most appropriate to leave them detailed - the exact position of the boundary relative to my house is something I want to know as accurately as possible, and as I've only loaded the data for my local area, there isn't too much data and Tableau can cope with the detail.
The third tab in the viz above demonstrates some of the differences between what you can achieve with filled map support and what you can achieve with "the old approach" of joining all the boundary points as polygons pioneered by Joe Mako a couple of years ago. For one thing, with the filled map approach, there are only three rows in the data source and three (very complex) marks displayed. With the polygon approach, the data source has over 20,000 rows - one for each boundary point. That makes things much simpler when it comes to using the shapes for actually displaying analysis results (here I'm just using it as a drawing tool, really).
Another difference is that the polygon approach doesn't allow for holes in shapes. The best you can do is draw another shape over the top, representing the hole. I've illustrated this with the Tsunami zone around a headland across the bay from my house. Whilst support for holes in shapes matters a lot for certain types of GIS applications (such as if you're a resident who wants to know if their house is at risk of Tsunami damage), in practice it's probably not often important for the sort of visualisations being done with Tableau.
But there are pros and cons of the custom geocoding hack and the polygon approach - not least that the polygon approach is completely supported by Tableau. So I've also built into the tool the ability to output the data in much the same format as Joe's utility does, allowing the individual points to be plotted as polygons. This just has the advantage over Joe's process that it automates all of the manual steps needed to transform the shape file to an ESRI format file in the right coordinate reference system, and it also supports simplification of complex shapes.
I'm hoping to finish off tidying up loose ends and make this available sometime within the next couple of weeks.