2 Replies Latest reply on Jul 9, 2013 4:02 PM by claire.wooton

    Inflated Count Distinct

    claire.wooton

      I have some data that shows what program medical residents were enrolled in for each year of their residency training (see example data below).

       

      I'd like to be able to create a pie chart that shows the distribution of first year residents across residency programs.To accomplish this, I use a Count Distinct on Student_ID and apply the filter year_of_residency = 1 to ensure that only records from the first year of residency training are captured.


      However, what I'm really interested in is details of the earliest residency program that residents were enrolled in. If residents have switched residency program then they will have more than one record when year_of_residency = 1, e.g. StudentID 4 and 5. As you can see in the attached workbook, this issue makes the results look inflated. There are five students in the dataset, but seven appear in the pie chart.

       

      example_pie_chart.png

       

      To solve this problem, I'd like to be able to select the record that has a year_of_residency = 1 and the earliest residency_start_date for each studentID. I've read through other postings on count distinct and table calculations, but I can't figure out how to implement this in my workbook. If anyone could offer up some advice, I'd very much appreciate it.

       

      Thanks!

       

      studentIDyear_of_residencyresidency_programresidency_start_date
      11

      family medicine

      1-Jul-04

      12family medicine1-Jul-05
      21internal medicine1-Jul-04
      22internal medicine1-Jul-05
      23internal medicine1-Jul-06
      31pediatrics1-Jul-04
      32pediatric nephrology1-Jul-05
      41family medicine1-Jul-04
      42family medicine1-Jul-05
      41pediatrics1-Jul-07
      42pediatrics1-Jul-08
      51internal medicine1-Jul-04
      51psychiatry1-Jul-05
      52psychiatry1-Jul-06