6 Replies Latest reply on Dec 1, 2018 5:59 PM by Katherine Dinsmore

    Linked tables causing wrong record counts

    Katherine Dinsmore

      Hi Tableau community,

       

      I'm getting really discouraged after trying to solve this myself for several days now so I've decided to reach out for some help. I'm very new to Tableau so please be patient. I am working with a large amount of survey data (stripped down version used for trouble-shooting here) and am having major issues getting accurate record counts. The attached workbook consists of two tables linked by QuestionID (pivot table names) with only 5 rows of data. Here's my problem, you will notice on Sheet 1 I am trying to get a simple count of *** (male or female) but the numbers generated are clearly not reflective of 5 rows of data. Interestingly, when I use *** to slice the data, the numbers are accurate for each question listed (Sheet 2). What do I need to understand so that I can use this field [***] in either scenario with consistent results? I want to see a total of 5 (3 males 2 females) on Sheet 1 and things to remain the same on Sheet 2 because that appears to be working as expected.

       

      Help! Kate

        • 1. Re: Linked tables causing wrong record counts
          Chris McClellan

          Because of the way your data is structured, your getting the *** value on every response.

           

          To drop it down to the proper values (there's 5 respondents) you have to do this:

           

          COUNT([Number of Records])/COUNTD([Question ID])

           

          2018-12-01 20_39_41-Tableau - Sample Data.png

          • 2. Re: Linked tables causing wrong record counts
            Chris Chalmers

            Hey Katherine,

             

            Short answer: use COUNTD([UID]) (count of distinct user IDs) instead of SUM([Number of Records]).

             

            Long answer: your original data may have had 5 rows, but after pivoting your data has 755 rows. Summing [Number of Records] counts number of rows, which is why you're getting such large numbers. It sounds like what you really want is to count the number of respondents, not the number of rows. Tableau's COUNTD function returns the number of unique values in a column in the current "slice" of the data (I'm sure there's a more technical term for that). For example, when you slice up the data by ***, there are 3 unique user IDs in the Male slice and 2 unique user IDs in the Female slice.

             

            Counting IDs instead of summing number of records is a good habit to get into in general, even if you know what each row of the data represents. As you found by pivoting, the meaning of a row can change as you work with your data.

             

            Hope that helps!

             

            -Chris Chalmers

            • 3. Re: Linked tables causing wrong record counts
              Katherine Dinsmore

              Thanks, Chris! This has been making me crazy for a really long time so I appreciate you taking the time to explain it! Cheers, Kate

              • 4. Re: Linked tables causing wrong record counts
                Katherine Dinsmore

                P.S. Chris, after thinking about this, why didn't changing number of records to Count (Distinct) do the same thing? I did try to do that at one point but ended up with each value ( Male = 1, Female = 1) being counted instead. I also wondered if there is a better way to structure the data if this will be an ongoing issue?

                 

                Kate

                • 5. Re: Linked tables causing wrong record counts
                  Chris Chalmers

                  Hey Katherine,

                   

                  [Number of Records] is really just a column where every row has a "1" in it. When you sum the column, all those 1s get added up and the result is the number of rows, but when you count distinct you will always end up with exactly 1, because 1 is the only distinct value in the column.

                   

                  I don't think you have a data problem, your data is structured perfectly. I think [Number of Records] has tricked you into believing it can do more than it can. [Number of Records] exists for one purpose: to sum the number of rows in data without a unique ID column. For example, if I polled people's favorite fruits and got data like this:

                   

                  Favorite FruitNumber of Records (generated)
                  Apple1
                  Apple1
                  Banana1
                  Kiwi1

                   

                  The only way to count up responses is to sum [Number of Records], because I didn't include a unique ID field for each respondent in my data. (edit: technically a non-distinct count of [Favorite Fruit] would also work, but that would have different behavior if the data contained nulls). More robust data, like yours, has such a field:

                   

                  Respondent IDFavorite FruitNumber of Records (generated)
                  1Apple1
                  2Apple1
                  3Banana1
                  4Kiwi1

                   

                   

                  Now I can count distinct [Respondent ID], and I'll get the same result as summing [Number of Records]. I don't even need [Number of Records]. The reason this is more robust is that the first solution makes an assumption that each row is a unique respondent. That assumption becomes false after pivoting, joining, and a number of other data transformations. For example, let's join the results of another poll of the same respondents, but this time they were allowed to select multiple options:

                   

                  Respondent IDFavorite Animal(s)Number of Records (generated)
                  1Dog1
                  1Cat1
                  2Iguana1
                  3Snake1
                  3Gecko1
                  3Dog1
                  4Cat1

                   

                  Joined to the first poll:

                   

                  Respondent IDFavorite Animal(s)Favorite FruitNumber of Records (generated)
                  1DogApple1
                  1CatApple1
                  2IguanaApple1
                  3SnakeBanana1
                  3GeckoBanana1
                  3DogBanana1
                  4CatKiwi1

                   

                  Now sum the number of records for which [Favorite Fruit] is Banana. You'll get 3, because three rows have [Favorite Fruit] = "Banana". But all three rows are associated with Respondent #3, and what we really want to know is how many respondents' favorite fruits are Banana. The correct answer to that question is 1, which we get if we count distinct [Respondent ID] for which [Favorite Fruit] = "Banana".

                   

                  [Number of Records] is the easiest way to count things when first starting to use Tableau, and users can be lulled into a sense that it is a bit magic. Specifically, it can seem like [Number of Records] counts exactly the thing you want to count, whatever that may be. It really counts one and only one thing: number of rows. Don't restructure your data, just shift to using "count distinct" of ID fields and you'll see more robust results when transforming your data.

                   

                  Happy Vizzing!

                   

                  -Chris Chalmers

                  • 6. Re: Linked tables causing wrong record counts
                    Katherine Dinsmore

                    Chris,

                     

                    What a fantastic explanation Thank you for taking the time to teach me and any other newbies that stumble onto the thread! Enjoy the remainder of your weekend and happy vizzing to you as well!

                     

                    Cheers, Kate