6 Replies Latest reply on Aug 2, 2013 7:38 AM by Jonathan Drummey

    Aggregate only distinct values

    Ian Conlon

      I have a dataset with sales data at the salesperson level. I'd like to get a total sum of their sales. The issue is that sometimes more than one salesperson exists on a single sale, so when they're both included in the table, the total sales value is essentially double counting. That said, there is a variable in my dataset called sales ID that is unique for each sale, no matter how many salespeople were on the sale. I want to be able to get a sum of total sales that counts each sale ID only once.


      Example data:




      So, in this example, the sum would show $135.00 for all salespeople (40+50+25+20). If I were to include only salespeople 001 and 003 in the table, the total should show $85.00 (40+25+20). If I were to include only salespeople 002 and 003, the total should show $135.00 (40+50+25+20). 


      I suspect this is a rather simple problem, but I'm having a hard time figuring it out. Any help would be appreciated.

        • 1. Re: Aggregate only distinct values
          Joshua Milligan



          You can use a table calculation to filter the results.  I've attached a workbook using your data that demonstrates how.  It has internal documentation.  If you need any further explanation or if you run into any complications with a more complex data set, I'd be happy to answer any questions you might have.




          • 2. Re: Aggregate only distinct values
            Ian Conlon

            Thanks for the quick response. That's close, but unfortunately, it's not quite what I'm looking for (sorry, I should have been more specific). Your graph, which does deduplicate, shows distinct values by Sales ID. The final graph I would like will show year of sales along the bottom with the columns representing the total (unique) sales for that year. Here's what the graph will look like:




            The idea is that for a given fiscal year, I may filter out different salespeople. If I include salespeople who were on the same sale (duplicate sales ID), I want the value from that sales ID counted only once in the display. It looks like your "First()" calculation is probably part of the solution, but it should be pulling the first SalesID within a particular series of Sales IDs, not the first salesperson. As you can see, the shelves in my graph are year of sale and total sales. When I added your "First()" filter to my data, it said that it was acting on year of sale. How can I get it to act on Sales ID? For the record, all sales IDs in the data are unique--they do not reset with each fiscal year, so all I need is to say "If this is the first instance of the sales ID in the data, then count the sales amount, else do not count it."

            • 3. Re: Aggregate only distinct values
              Joshua Milligan



              That definitely adds some complexity.  If there is any way you can de-duplicate at the data source level, that would be ideal.  Still, I think there are similar solutions to the one proposed above that will work with more robust data sets and additional visualization requirements.  If you are able to post a sample data set or (even better) a packaged workbook (.twbx), I'd be happy to take a look and work through some options.



              • 4. Re: Aggregate only distinct values
                Ian Conlon

                So, I got the formula to work--the graph is now showing only one value per sales ID. The problem is that the bars are now broken down by sales ID. I'm not able to select a full FY bar by itself, and whenever I try to add data labels, it shows the individual values for a particular sale, not the TOTAL value for the fiscal year. Here's a screenshot of what's happening:




                How do I get rid of the individual sale labels and get a single FY total label at the top of each bar? It appears that adding the levels of "Salesperson ID" and "Sales ID" made the bars break into specific units rather than sticking together as cohesive wholes.

                • 5. Re: Aggregate only distinct values
                  Joshua Milligan



                  I think you've hit on a limitation of the proposed solution.  As Sales ID and Salesperson ID both have to be in the view for the table calculation filter to work, they will slice the data to that level of granularity.  If you are only concerned with a visual display of the bar chart (for example, you are going to distribute a pdf), you could turn labels off and click "Color" and set boarders to none.  That will hide the stacked nature of each bar and give the appearance of a single bar.  You can even use reference lines to label the value for each "single" bar. 


                  But any interactivity, such as clicking to select, tooltips, actions on a dashboard, will still be by slice of the bar.  So, if you need that kind of functionality you will probably need to start thinking of ways to de-duplicate the data outside of Tableau.




                  • 6. Re: Re: Aggregate only distinct values
                    Jonathan Drummey

                    Hi Ian,


                    There are ways to get the desired result with a nested table calculation. The first part is to generate a single total sales per SaleID, then to sum that up per year. Once we have that, then generating a bar chart is simple.


                    The "trick" here is to think about what Tableau needs to draw the bar chart, which is 1 mark per chart. However, to get the calculations to work, we need both SaleID and SalesPerson in the view, which will be N Marks per chart. So, as we're building the table calculations, we use the IF FIRST()==0 technique to reduce the result set. In the attached, the first calc, Sales per Sale ID, has a Compute Using of the SalesPerson and cuts down the results to one per SaleID (all other values are Null). Then the second calc, Total Sales per Year, adds up the Sales per Sale ID for each year has an Advance Compute Using on SaleID and SalesPerson and further reduces the results to one per Year. Then we can build a bar chart with that, use the regular labels, and put a copy of the Total Sales per Year set to filter for non-Null values so Tableau is only drawing the 2 marks. I also turned off the tooltips for the SaleID and SalesPerson in the final view, they just confuse things.


                    Let me know if this works for you!




                    PS: in the future, please post the sample data in a text or Excel file, it makes it easier for us to not have to try do dummy up your data. See #8 in http://community.tableau.com/docs/DOC-5065.