8 Replies Latest reply on Nov 9, 2013 6:54 PM by Kurt Heisler

    Pre-aggregating and reshaping data

    Kurt Heisler

      I have data at the individual level, but for confidentiality reasons I need to create an aggregated dataset that includes only counts of people from broad groupings (e.g., the number of Males who entered prison in 2010, etc.).


      The data is complex and I'm not sure how best to reshape it so it's optimized for Tableau.


      An example of the individual level data is attached. Each row is a person (a prisoner) along with:


      - Year - the year the prison reported the data

      - *** - Male or Female

      - The prisoner's support system (three y/n variables: Family, Friends, Other)

      - The prisoner's cohort (three y/n variables that indicate if the prisoner was 1) in prison at the start of the Year, 2) entered prison during the Year, and/or 3) left during the Year

      - some outcomes for the prisoner (whether they were eligible for program X, completed program X, eligible for parole, achieved parole, etc.)


      In Tableau, I'd like to be able to show the number and % of prisoners with various outcomes, and filter or group by the other variables

      (demographics, support system, and cohort). I'd also like to be able to show the number of prisoners in each cohort, filtered or grouped by Year, ***, etc.


      What's the best way to reshape this data to support this? If I was concerned only with Year, ***, and Outcomes I'd probably do something like this:


      Year***OutcomeEligible (Denominator)
      Achieved/Completed (Numerator)
      2010MVotech Training3010
      2010FVotech Training4015


      (To get %s, I'd do a table calculation of the numerator/denominator.)


      But I'm stumped when it comes to incorporating what is essentially the "check all that apply" items where a prisoner may have one, two, or three support systems, and be a member of one or more cohorts (e.g., those who entered and left prison that year). Aggregated counts of these aren't mutually exclusive - so if I had one row that showed there were 40 prisoners who are Male with Yes for Family Support, and another row that showed me there were 20 Male prisoners with Yes for Friend Support, some of the same prisoners may be represented in both counts. If I tried to show a count for only Male prisoners, Tableau will tell me I have 60 males (40 + 20) which wouldn't be correct.


      Any suggestions?

        • 1. Re: Pre-aggregating and reshaping data
          Alex Kerin

          I think you need someone who can write some custom SQL for you to do this. Be aware though that if you are only using desktop to distribute this, then you would need to use extracts to stop someone editing the SQL to get to the raw data.

          • 2. Re: Pre-aggregating and reshaping data
            Dallin Crump

            Does each prisoner have a unique ID in the original dataset? If so, it should be easy for someone to use SQL to aggregate the data properly and get the correct counts you'd need.

            • 3. Re: Pre-aggregating and reshaping data
              Kurt Heisler

              Yes, each prisoner has a unique ID in the original dataset. (The attached Excel sheet is an example of the original data.)


              I'm confused about the SQL suggestion. The original data is in a flat, Excel file (like the attached). Is the suggestion to you use SQL against that? Or use SQL within Tableau? Further, once I do get unique counts, how / where would I put them in the reshaped data: I'm still not clear how the data layout should should be structured so I have workable dimensions, filters, and such.


              Thank you for any clarification.

              • 4. Re: Re: Pre-aggregating and reshaping data
                Dallin Crump

                Have you tried using the Tableau Excel Add-in to reshape your data?


                Installing the Tableau Add-In for Reshaping Data in Excel | Tableau Software


                See the attached Excel file for reference. I populated the rest of your example Excel sheet with dummy data, then used the Tableau add-in to reshape it.


                Instead of having different columns for each of your support systems, cohorts, and outcomes, you now have one column (which I called Type, but you can call it whatever) that contains each of your categories.


                You can do distinct counts on the ID and filter on your other columns to get the correct headcounts.


                Try playing with the Excel Add-in pulling it into tableau Tableau to see if it'll do what you need.



                1 of 1 people found this helpful
                • 5. Re: Re: Re: Pre-aggregating and reshaping data
                  Kurt Heisler

                  Thank you!


                  I pulled that data into the attached Tableau workbook. I put Type in rows and in the Marks I put Sum(YesNo) and CNTD(ID). (If I tried to add either of those to a shelf things got weird.)


                  So far so good!


                  But two things that are stumping me:


                  1. Calculating percent, like the percent paroled (Paroledin6_Eligible/Paroledin6_Achieved) is tricky because I need to divide the two values within one variable (Type) across rows rather than columns. My goal is to present some simple tables, like:


                  Paroled (Eligible)Paroled (Achieved)Percent Paroled


                  If I add a dimension, it should recalculate, like:


                  ***Paroled (Eligible)Paroled (Achieved)Percent Paroled


                  I created a Calculated Field for the Percent Paroled with this:


                  lookup(sum([YesNo]), first()+1)/lookup(sum([YesNo]), first())


                  This calculates it correctly only if I filter Type to show only the two Paroled items. (If I want to add other Type values to the table, I suspect I'd need a bunch of IF THEN statements, like IF([Type] = "ParoledIn6_Eligible" ... THEN ...). I also can't get the percentage to appear in a third column like the above table. It forces me to have a row for Paroled Eligible and a row for Paroled Achieved, and the % is duplicated in each. 


                  2. I can't figure out how show (e.g.) the number and percent paroled by or Support System or Cohort. Because those are also Types, they get added to the row list along with the Parole options. How can I use those as unique dimensions when I want to treat them that way?


                  Thank you so much for any more tips.



                  Thank you for any continued tips. I think I'm almost there.

                  • 6. Re: Re: Re: Re: Pre-aggregating and reshaping data
                    Kurt Heisler

                    Progress! (Revised workbook attached).


                    I was able to calculate the percents (Paroled_Achieved / Paroled_Eligible) and create a table as desired.I created three calculated variables: ParoledAchieved (if [Type]='Paroledin6_Achieved' then [YesNo] end), ParoledEligible (same idea), and ParoledPercent (sum([ParoledAchieved])/sum([ParoledEligible]).


                    The remaining challenge is adding the Support System or Cohort values to the view (as dimensions, like ***, and Year). For example, what percent were paroled for those with who had only Friends as a support system, only Family, Friends and Family, etc.


                    Any ideas?

                    • 7. Re: Pre-aggregating and reshaping data
                      Shawn Wallwork

                      Kurt, this will get you the type of support:


                      IF LEFT([Type],FIND([Type],'_')-1)='Support' THEN RIGHT([Type],LEN([Type] )-FIND([Type],'_')) END


                      Then use that in a calculation to get count of support:


                      IF [What Support?]='Family' THEN [ID]

                      ELSEIF [What Support?]='Friends' THEN [ID]

                      ELSEIF [What Support?]='Other' THEN [ID]



                      Wasn't sure how you wanted to use it in your viz. But this should get you started.



                      • 8. Re: Re: Pre-aggregating and reshaping data
                        Kurt Heisler

                        Thank you, Shawn. I created those variables but still can't get the counts for each support to show correctly.


                        I know from my data that, out of 11 prisoners eligible for parole, 7 (64%) achieved it.


                        But I'd like to see how the percent paroled varies by Support System. The desired table should look like this:


                        Header 1Paroled EligibleParoled AchievedPercent Paroled


                        The sum of paroled eligible column exceeds 12 (the total number of unique prisoners) because each prisoner can have more than one support system. That's okay.


                        Sheet 3 of the attached workbook has the layout for this table, but the count of each type of support is 78 in every cell.


                        Thanks again for your expertise.