I started down this path before I understood the Iron Viz should have been 1 visualization with 1 About page. This is an entirely different route, but still a fun exercise. Enjoy!
Method:
Data Management
The simplest way to get to large lists of data in Wikipedia is to use Google's Fusion Tables Search. I started here and browsed through all of the results on Wikipedia to find common themes. Lists by Country and U.S. State appeared to be the most common. I reviewed the top ~400 search results for interesting state-related data tables: https://research.google.com/tables?hl=en&q=site+en.wikipedia.org+%22List+of+U.S.+states%22. There were about 28 that i selected to include as a base data set to use for my own viz and to share with my colleagues at Boulder Insight.
Each of these tables were imported into sheets in a single Google Sheets Workbook. The following data transformations were applied while the data was still in the spreadsheet:
Normalized column headers
- Edited out Wikipedia markup artefacts
- Condensed two-row headers into one-row headers
- Prefixed with table name where needed
Data Specific Transforms
1) Replaced all instances of """ with a """ (double quote)
2) Replaced all instances of "'" with a "'" (single quote)
3) On [Cannabis].[unnamed column b - corresponded to legend color coding] - changed the column name to "Legality" and replaced the letter code with the language from the Map Legend:
a = Jurisdiction with legalized Cannabis
b = Jurisdiction with both medical and decriminalization laws
c = Jurisdiction with legal medical Cannabis
d = Jurisdiction with decriminalized cannabis possession laws
e = Jurisdiction with total cannabis prohibition
4) Changed the primary key from all other descriptions (eg., "State/Territory", "State or District", etc.) to just "State"
5) Normalized State names throughout ... Hawaii, District of Columbia
6) Collapsed duplicate state records on Etymology into single records
7) On [Minimum Wage], moved text to Notes field. Converted currency to numeric.
8) On [Temperature], edited column names, removed asterisks from dates, retained only Fahrenheit temperatures and first location of temperature recording.
The workbook was then saved as XLSX for easy joining in Tableau. The Population table was used as the primary table with all other LEFT JOINED to the Population table.
Analysis
With a data set including over 50 Dimensions and 125 Measures across 28 topics, what could be gleaned from the relationships? The first task was to determine if there were any convenient linear relationships. I wanted a convenient way to choose a Measure and then sequentially run through every other measure looking for obvious connections. The term "Spurious Correlation" came to mind and, naturally, using Wikipedia I learned that i really meant "Spurious Relationship".
I created two views that allowed me to do just this. The first plots any selected dimension against any selected measure. The second does the same thing for two measures.
Running through the possibilities produced a few interesting relationships. The initial relationships were found with the "Spurious Relationship Creator" and then created in fixed worksheets. These are included on separate worksheets in the Viz linked above. At an aggregate State level, it's really a stretch to say that any of these inferences are true, but it's still interesting. Here's what i learned ...
Billionaires
Here, i was genuinely curious if the presence of billionaires in a state had an impact on the prosperity in a state. It doesn't appear that they do.
Impact of Income on Education/Health
This one seemed more intuitive. The more people in a state with Bachelor's and Advanced degrees, you'd expect the income to be higher. The impact that poverty has on obesity is also intuitive ... cheap food <> healthy food. Life Expectancy also followed a similar pattern ... be poor and die sooner.
Hobbies (Just for Fun)
Sort of the miscellaneous drawer of this workbook, several things that didn't stand up as their own viz were thrown together. Fun facts:
- There really are "donor states" ... states that pay more in taxes than they receive in federal spending. But does this include pork barrell spending?
- Island territories look like fun places.
- Live Long and Prosper ... dude.
- Maybe we could learn from each other.
Thanks for reading,
William
Comments