9 Replies Latest reply on Nov 6, 2012 5:23 PM by Richard Leeke

    Converting Australian Bureau of Statistics shape files

    Nicola Powell

      Firstly thanks to Joe Mako for his brilliant tutorial on how to convert a shape file for use in Tableau. 




      I currently have a trial version of the software to assess if Tableau will meet our mapping and reporting needs.  Everything was looking great until I hit the issue I have outlined below. 


      I have just tried to convert the 'Collection District Digital Boundaries, Victoria (ASGC 2006) in ESRI Shapefile Format' file available from the Australian Bureau of Statistics (ABS, http://www.abs.gov.au/ausstats/abs@.nsf/DetailsPage/1259.0.30.0022006?OpenDocument), for use in Tableau.  As per Joe's instructions.  

      Everything was working perfectly, I loaded the polygon.txt and attributes.txt files into Tableau.  Set my filter to view a few of the Collection District areas initially whilst setting everything up, as per the tutorial.  However once I tried to remove the filter to display all of the Collection Districts within the file (as required for the analysis I wish to undertake), about 9,000 (from memory) Tableau crashes telling me I'm out of memory. 


      Does anyone know of a solution to my problem?

      Is this a common issue when mapping in Tableau?

      Has anyone else managed to convert the ABS collection district / mesh block file for use in Tableau?


      Any help for a Tableau newbie much appreciated. 


        • 1. Re: Converting Australian Bureau of Statistics shape files
          Richard Leeke

          The problem is that the level of detail in those boundaries means that the number of boundary points is getting too large for Tableau to handle.


          I did a lot of work on this a few months ago and provided a couple of utilities which provide a couple of methods which can help - though there are compromises with both approaches.


          One of the utilities ("shapetotab") basically automates the process you followed in Joe's tutorial, taking out all of the manual steps. But it does add a couple of extra options - one of which is to "simplify" the boundary definitions (i.e. reduce the level of detail and hence the number of boundary points). Simplification has the advantage that it lets the maps draw more quickly (or even draw at all), but has the disadvantage that you lose some accuracy, which can result in gaps or overlaps between shapes. So there is a trade-off when deciding how much to simplify by.


          The other utility ("tabgeohack") is much more of a hack - it actually adds the shapes into Tableau's internal geocoding database. There are a number of advantages to this approach - but it takes you into unsupported territory, so I generally recommend people to go with Joe's approach or shapetotab, rather than resorting to the hack.


          Both utilities are described in a thread on the VizTalk area of this site. Here is the posting about shapetotab.


          I had a quick go with shapetotab on your file. Just running the utility with no simplification generated 1.8 million boundary points - which should be very similar to the result you got with Joe's approach, I imagine.


          >shapetotab CD06aVIC.shp

          Generated 9310 feature rows for CD06aVIC

          Generated a total of 1834016 points (min: 5, avg: 196, max: 28378)

          Overall totals: 9310 rows, 1834016 points

          Done in 35 seconds


          I tried drawing the shapes at that detail and whilst Tableau didn't actually crash, it was certainly creaking and groaning a bit, so I cancelled it before it had finished. I then regenerated the boundaries, simplifying to 0.0001 degrees (which is still retaining a reasonably high degree of accuracy).


          >shapetotab --simplify .0001 CD06aVIC.shp

          Generated 9310 feature rows for CD06aVIC

          Generated a total of 442569 points (min: 5, avg: 47, max: 2178)

          Overall totals: 9310 rows, 442569 points

          Done in 27 seconds


          That reduced the number of points by a factor of 4, which allowed the map to draw - albeit a bit slowly.




          Zooming in on Melbourne, the detail is still pretty good even for the smaller shapes.




          But then zooming right in on some of the finest-grained detail in central Melbourne you can see the effects of the simplification - with a few slivers and overlaps showing between some of the shapes.





          But there may also be other ways to tackle this. Drawing all 9,000 polygons at the same time results in too much detail for the eye (or the screen) to cope with, so depending on the nature of the analysis you are doing you may find it better to start at a higher level in the geographic hierarchy and allow drill-down from there into individual LGAs or SLAs or whatever.


          Hope that helps.

          • 2. Re: Converting Australian Bureau of Statistics shape files
            Nicola Powell

            Richard, thank you so much for your help and advice.  There is no way I would have figured any of that out.  I am now using your shapetotab method, its amazing!  Plus I'm able to convert the MapInfo files I have already customised, such as selecting only Metro CD's etc., making the task much easier.   

            • 4. Re: Converting Australian Bureau of Statistics shape files
              Marina Zhang

              Hi Richard and Tableau community!


              I've been able to map Australian Bureau of Statistics Australian postcodes boundaries beautifully using Richard's shapetofile utility (thanks Richard!).


              Now I want to create a choropleth map with student enrolment numbers for an Australian university using that map created with the ABS digital boundaries, and this is where I'm getting stuck! How do I link the data (ABS digital boundaries data and student enrolment data) such that I can create this map?


              Any help would be greatly appreciated!

              • 5. Re: Converting Australian Bureau of Statistics shape files
                Richard Leeke

                Is your student enrolment data classified by any of the levels included in the ABS files? For example, the file that Nicola referenced at the start of this post (Collection District Boundaries for Victoria) has a CD (Collection District) code, an LGA (Local Government Area) code and name and SLA (Statistical Local Area) code and name. A random row picked from that file looks like this:



                  STATE_CODE (String) = 2

                  CD_CODE06 (String) = 2292001

                  SLA_CODE06 (String) = 205057352

                  SLA_NAME06 (String) = Yarra (C) - Richmond

                  LGA_CODE06 (String) = 27350

                  LGA_NAME06 (String) = Yarra (C)


                So if you have your data classified by any of those things, you can join it to the ABS data to get the map you want. If you have the necessary classification but need a hand joining it all together, maybe post some sample data (which can be fictional - just as long as it has the same structure as your data and the includes the geographic classification).

                • 6. Re: Converting Australian Bureau of Statistics shape files
                  Marina Zhang

                  Hi Richard!


                  Thanks for your quick reply.


                  Yes, the student enrolment data I have is classified by postcode and CD. I've looked up the Tableau tutorial on combining multiple data sources into one view, but haven't made much progress with that.


                  The joining tables function seems to require that the data all be in the same file?


                  I've attached the shapetofile outputs of the ABS data and a student enrolment dummy data (which is in Excel, but the original is in Access).


                  Thank you so much for your help!

                  • 7. Re: Converting Australian Bureau of Statistics shape files
                    Richard Leeke

                    There are essentially two different approaches to joining date like this. You can join them in the data source, but as you say that means that all data has to be in the same data source - which in your case would mean either that all of your "tables" (features, points and enrolments) would need to be the same type: either all CSV files, or 3 separate worksheets in a single spreadsheet or 3 tables in an Access database. You could do it like that, by converting in one direction or another, but there's an easier option.


                    The other way of doing it is called "data blending". With that approach, you define two (or more) separate data connection in Tableau and tell Tableau how they are related, so that Tableau can join them together. With this approach, they can be in different databases.


                    In the attached workbook I have defined two data sources. The first joins the features and points from shapetotab, which allows you to generate the basic map. The second just references the sheet in your dummy spreadsheet. You can just replace this second connection with one which references the table in your Access database.


                    Then what I have done is told Tableau that it needs to join them based on the postcode: [POA_CODE_2] from the features file and [home residence postcode] from the spreadsheet. There was one slight twist here: the [POA_CODE_2] field is treated as a string, whereas the postcode in the spreadsheet is treated as a number - so I have created a calculated field which explicitly converts the postcode to a string. Then I just went into the Data->Relationships menu option and told Tableau that those fields should be used for joining.


                    Having done that, I took a copy of the basic map, added [POA_CODE_2] to the level of detail shelf (Tableau needs that to do the join) and then just dragged [Student enrolment count] onto the colour shelf.


                    I needed to zoom right in on Melbourne because I left the poscodes with no enrolments showing. If you filter to only those with non-zero counts the map will automatically size to enclose all postcodes with enrolments.

                    • 8. Re: Converting Australian Bureau of Statistics shape files
                      Marina Zhang

                      Hi Richard,


                      Thank you so very very much for your help!


                      The things that I had neglected to do before that was apparently the cause of my headache were:

                      - identifying that POA_CODE_2 was a string whilst the postcodes in my student enrolment file were numbers

                      - joining the features and points from shapetotab

                      - Adding POA_CODE_2 into the 'level of detail' shelf (Does Tableau need this to perform the join simply because a relationship was created between it and the postcodes in my student enrolment file?)


                      The choropleth maps are working beautifully now! Thank you again!

                      • 9. Re: Converting Australian Bureau of Statistics shape files
                        Richard Leeke

                        > Does Tableau need this to perform the join simply because a relationship was created between it and the postcodes in my student enrolment file?


                        Tableau 7 needs that for data blending. I understand that Tableau 8 (which was announced earlier today at the conference) will do away with the need to do that - though I don't know the details.


                        Glad it's working for you.