7 Replies Latest reply on May 3, 2018 10:07 AM by Jana STUPAVSKY

    Data source automated cleanup and prep - making the spreadsheet tall and slim

    Jana STUPAVSKY

      Hi - I am in the process of downloading dozens and dozens of spreadsheets with various workforce-related data. Three examples are attached, but I am looking at a total of about 130 files just like this. They need some clean up which, if done manually, will require lots of repetitive "copy & paste" steps in to one master spreadsheet to add rows at the bottom. I am hoping to figure out the way to avoid this manual clean up and prep.

       

      When you open them, they look relatively the same as far as headers etc., but the data applies to various geographic regions - County, City and ZIP, for each possible type of profession as you can perhaps tell from the name of each file. I need to figure out the way how to make these spreadsheets more "Tableau friendly." It is my understanding that it's better to have data tall and slim, rather than wide, like I have now. (If what I have now is just fine for Tableau, by all means please let me know, but I find it quite a mess to work with. I am looking to build a dashboard similar to this: Charleston Regional Competitiveness Center | Employment by Occupation ).

       

      If you open one of the examples I attach, you can see the headers like this:

      SOC,  Description, 2001 Jobs, 2002 Jobs, ..... 2017 Jobs, 2001 Resident Workers, 2002 Resident Workers, etc.

       

      For slimmer data source, I would like the headers to look like this:

      SOC, Description, Year, Workforce Type (Jobs or Resident Workers), ZIP, City, County

       

      Basically, the "2001 Jobs - 2017 Jobs" columns need to be stacked on top of each other into one long column and the same applies to 2001-2017 Resident Workers and data has to correspond with the year from the column they came from, for the SOC and Description and County, City and ZIP they apply to. Makes sense? I suspect County, City and ZIP can be somehow derived from the file name by Split function, but I am unsure how to collapse the "Jobs" and "Resident Workers" into one long stacked column while corresponding with all the other dimensions (SOC, Description, Year, ZIP, City, County), while avoiding the repetitive copy & paste.

       

      I have tried to use Trifacta Wrangler but unsuccessfully thus far. Any hlep will be much appreciated.