9 Replies Latest reply on Nov 28, 2017 2:30 PM by Derek Konofalski

    Combining/summing rows based on different columns

    Derek Konofalski

      I have 2 datasets that I'm trying to chart out on a dual-axis chart.  Here's some sample data and then an explanation of what I'm trying to do:

       

      - The first dataset is a list of costs per category for each year.  Every year does not have the same categories but that doesn't matter as I'm only interested in the total cost per year.

      Year     Category     ID     Cost

      2000     Software     01     50.45

      2000     Hardware     02     100.14

      2000     Support      03     47.77

      2001     Software     04     55.00

      2001     Ancillary    05     32.54

       

      - The second dataset is a list of counts by auditors of how many pieces of hardware we maintain. Every year has an audit done by AAA but only some years have an audit done by ARC (if there was a discrepancy and the numbers had to be re-counted or re-checked). There are 1 or 2 years where there's a third source.

       

      Year     Source     ID     Count

      2000     AAA        01     2054

      2000     ARC        02     2059

      2001     AAA        03     2109

      2001     ARC        04     2221

       

      I would like to plot out this dual axis chart that shows how our counts compare to our costs for each year.  On the Source dataset, I want to average out the counts and plot those and, on the Category dataset, I want to combine all the costs for each year. Ideally, I would have a chart with the x-axis showing all the years and then 2 different lines (in different colors) showing each point (the average for counts and sum of the cost) for each year.  When I connect my datasets, it's defaulting to an inner join which seems fine but it's causing me to get duplicated data and I think the reason is because the joined data is duplicating all the categories for each year. Instead of getting just a sum of all the costs per year, I'm getting a sum of all the costs multiplied by the number of sources in the Source dataset.

       

      Admittedly, I don't have much experience with Tableau and would normally do this in R. Does anyone have ideas on how to accomplish this? I can't publish the full workbook (and I'm not even sure that it's set up correctly) but I am more than willing to answer any questions needed.