2 Replies Latest reply on May 16, 2011 9:20 PM by Joe Mako

    Weighting Data

    John Mogielnicki

      Hello.  I'm wondering if it's possible to weight a dataset in order to have it accurately reflect a population. 

       

      In my example spreadsheet I have a sample of a population of retailers (first tab).  This makes up the dataset in the Tableau workbook.  I also have the actual percentage of each of the retailer types and geographies in the population (second tab of excel workbook).  I want to weight the sample dataset in Tableau to reflect the breakdown in the population.  So for example if we were to calculate the average purchase price of lightbulbs (second tab in workbook) we would want to give urban hardware stores more weight than urban big box stores, in accordance with their weighting percentage.  I know there must be a way to do this with Tableau's calculated fields.  I'll attach the packaged workbook and screenshots of the excel spreadsheet.

       

      Thanks in advance for your help!

        • 1. Re: Weighting Data
          Alex Kerin

          I'm almost there with a workbook example. but I can't work out your weighting formula - can you post the xls or provide the Editformula?

           

          Edit: so it's the population percentage/number of stores in sample?

           

          I don't know if this makes sense, but here's the workbook with that

          • 2. Re: Weighting Data
            Joe Mako

            Alex I like your solution, makes for a dynamic situation, change the underlying data, and the result changes.

             

            Building off of the logic Alex produced, here is another approach, that can be done a number ways, either by altering the original data, or using custom SQL, or a calculated field as done in the attached.

             

            If you can alter the data source from

             

             

            Geography Type    Item    Store Type    Store    Purchase Price
            
             Rural    Lightbulb    Big Box    C    0.1
            Rural    Lightbulb    Big Box    C    0.8
            Rural    Lightbulb    Big Box    C    0.9
            Rural    Lightbulb    Big Box    D    0.5
            Rural    Lightbulb    Big Box    D    0.7
            Rural    Lightbulb    Big Box    D    0.8
            Urban    Lightbulb    Big Box    A    0.25
            Urban    Lightbulb    Big Box    A    0.5
            Urban    Lightbulb    Big Box    A    0.75
            Urban    Lightbulb    Hardware    B    0.75
            Urban    Lightbulb    Hardware    B    1
            Urban    Lightbulb    Hardware    B    1.5
            


             

            to

             

             

            Geography Type    Item    Store Type    Store    Number in Population    Purchase Price
            
             Rural    Lightbulb    Big Box    C    5    0.1
            Rural    Lightbulb    Big Box    C    5    0.8
            Rural    Lightbulb    Big Box    C    5    0.9
            Rural    Lightbulb    Big Box    D    5    0.5
            Rural    Lightbulb    Big Box    D    5    0.7
            Rural    Lightbulb    Big Box    D    5    0.8
            Urban    Lightbulb    Big Box    A    1    0.25
            Urban    Lightbulb    Big Box    A    1    0.5
            Urban    Lightbulb    Big Box    A    1    0.75
            Urban    Lightbulb    Hardware    B    10    0.75
            Urban    Lightbulb    Hardware    B    10    1
            Urban    Lightbulb    Hardware    B    10    1.5
            


             

            Then the calculated field:

             

             

            (AVG([Number in Population])/WINDOW_SUM(AVG([Number in Population])))/COUNTD([Store])


             

            Will give you the weighted percentage with just "Geography Type" and "Store Type" dimensions on the worksheet.

             

            If you wanted other dimensions on the worksheet, then the calculation may need to change, but still very doable. I suspect that you may want to use this result calculated at the "Geography Type-Store Type" level, but then use it at a different level of aggregation (eg "Item" level, when there are more than one "Item" values). If this is the case, can you provide another sample data set that represents this situation?