3 Replies Latest reply on Dec 6, 2013 6:26 PM by Joshua Milligan

    Removing Duplicated Data after Join steps don't work for me - help!

    Karen Clark

      Using Tableau 8.1.0 Desktop on Windows 7 with a MySQL 5.0.7.

       

      I have a joined data source from MySQL (three tables joined, not all represented here). case_id, start_date, and end_date are from Table 1, while event_ts is from Table 2:

       

       

      case_idstart_dateend_dateevent_tstotal_time_taken
      123412/03/201212/13/2012

      12/04/2012 05:00:12

      10
      123412/03/201212/13/201212/04/2012 17:54:0110
      123412/03/201212/13/201212/12/2012 03:39:2010
      125612/10/201212/31/201212/29/2012 12:00:4921

       

      total_time_taken is a Tableau calculated field using this formula:

       

      DATEDIFF('day',[Date Opened],[Date Closed])

       

      I need to find the average of total_time_taken for each case_id.

       

      I have followed the instructions in Removing Duplicated Data after Joining Tables | Tableau Software by creating a new calculated field "Get Avg" that has this formula:

       

      sum([total_time_taken])/count([case_id])

       

      However, that does not produce the right value. When I put that field into the Column shelf, it shows AGG(Get Avg) but the values aren't right:  operating upon the data in the attached CSV, for instance, the formula produces a value of 9.9 when it should be 10.5.

       

      I've been at this so long I don't know up from down any longer. Hopefully I'm overlooking something simple....

        • 1. Re: Removing Duplicated Data after Join steps don't work for me - help!
          Joshua Milligan

          Karen,

           

          The knowledge base article does help get the right answer, but only at certain levels of aggregation.  In your example, the technique will get the right answer per case, but when you try to get an average of all cases, you get the wrong answer, as you noted.  That's because it is an average of averages at that point.

           

          So, instead, try something like this:

           

          1. Start with a view of the data at an aggregation that gives you the right answer per case (I chose MIN, but MAX or AVG would work too.)  This is what the KB article gives -- but you'll need a couple more steps to get the overall average.

          0.png

           

          2. Create a table calculation.  I named mine [Average].

           

          [Average]

          WINDOW_SUM(MIN([total time taken])) / SIZE()

           

          The table calculation is calculated Table Down

           

          1.png

           

          The calculation adds up the minimum time (for each case_number, as that field on Rows is determining the level of detail).  The SIZE() calculation gives the size of the window.  As the calculation is performed Table Down, the size of the window is the same as the number of cases.

          The calculation gives the same result for each row, so it really only needs to be calculated for the first row.  For performance, it could be written: IF FIRST() == 0 THEN WINDOW_SUM(MIN([total time taken])) / SIZE() END

           


          3.  Clean up the view to show the answer without showing it for each case:


          2.png


          The [First Filter] calculation is simply: First() == 0.  I kept the true value, so only the first row of the table in the final view is shown.  I removed fields I didn't need and I also hid the row headers for case_number (right click the field on Rows and uncheck "Show Header").


          I've attached the workbook so you can see each step.  Hopefully that helps!


          Regards,

          Joshua

          • 2. Re: Removing Duplicated Data after Join steps don't work for me - help!
            Karen Clark

            This is outstanding, Joshua! Thank you so much.