3 Replies Latest reply on May 9, 2013 9:03 AM by Shawn Wallwork

    Identifying and calculating based on first occurrence of unique ID

    Tom Jones

      Looked at other forums and could not find a solution.  I'm using an extract from another system and would like to use this file 'as is' without any manipulation prior to loading into Tableau.

       

      Rather than exporting a Master and Detail file, this extract combines this information such that the Master Data is duplicated on each record (along with the unique detailed data - shown below).  I'd like to calculate the lead time (using DATEDIFF) for each project.  I only want to include one record per project as the number of records (parts used) will vary based on the complexity of the project.

       

      Also, the numbering sequence within the system starts at the same number for each distributor so you will see the same project numbers across multiple distributors, even though these are unique projects.

       

      My thoughts:  I can solve the duplicate project numbers (across the distributors) by concatenating the Distributor Number and ProjectID.  Also, if I could create a calculated field, setting a value of 'Include' for the first occurrence of a Distributor/ProjectID and 'Exclude' for any subsequent occurrences.  With that flag, the worksheet should show the appropriate calculation.  Any ideas?

       

       

      DistributorProjectIDScheduledExecutedPart Used
      110005/1/20135/3/2013A
      110005/1/20135/3/2013B
      110005/1/20135/3/2013C
      110015/1/20135/6/2013A
      110015/1/20135/6/2013B
      110015/1/20135/6/2013C
      110025/2/20135/7/2013A
      110025/2/20135/7/2013B
      110025/2/20135/7/2013C
      110025/2/20135/7/2013D
      210005/3/20135/4/2013A
      210015/4/20135/8/2013A
      210015/4/20135/8/2013B

       

      Message was edited by: Tom Jones the key metric I'm trying to calculate (in a worksheet) is the average project lead time for each distributor.  Since I have more than one record per project (and a varying number of records per project), I haven't been able to generate a single value (lead time) for each project and then use that single value to calculate the average lead time for that distributor. With the data above, Distributor 2 has two projects: 1000 (which had a one day lead time) and 1001 (with a 4 day lead time).  I'd like to calculate the average lead time per project (1+4)/2 = 2.5 days, however the calculations include both of the records from project 1001, so the average lead time is calculated as (1+4+4)/3 = 3.0 days.