Short answer: use COUNTD([UID]) (count of distinct user IDs) instead of SUM([Number of Records]).
Long answer: your original data may have had 5 rows, but after pivoting your data has 755 rows. Summing [Number of Records] counts number of rows, which is why you're getting such large numbers. It sounds like what you really want is to count the number of respondents, not the number of rows. Tableau's COUNTD function returns the number of unique values in a column in the current "slice" of the data (I'm sure there's a more technical term for that). For example, when you slice up the data by ***, there are 3 unique user IDs in the Male slice and 2 unique user IDs in the Female slice.
Counting IDs instead of summing number of records is a good habit to get into in general, even if you know what each row of the data represents. As you found by pivoting, the meaning of a row can change as you work with your data.
Hope that helps!
Thanks, Chris! This has been making me crazy for a really long time so I appreciate you taking the time to explain it! Cheers, Kate
P.S. Chris, after thinking about this, why didn't changing number of records to Count (Distinct) do the same thing? I did try to do that at one point but ended up with each value ( Male = 1, Female = 1) being counted instead. I also wondered if there is a better way to structure the data if this will be an ongoing issue?
[Number of Records] is really just a column where every row has a "1" in it. When you sum the column, all those 1s get added up and the result is the number of rows, but when you count distinct you will always end up with exactly 1, because 1 is the only distinct value in the column.
I don't think you have a data problem, your data is structured perfectly. I think [Number of Records] has tricked you into believing it can do more than it can. [Number of Records] exists for one purpose: to sum the number of rows in data without a unique ID column. For example, if I polled people's favorite fruits and got data like this:
Favorite Fruit Number of Records (generated) Apple 1 Apple 1 Banana 1 Kiwi 1
The only way to count up responses is to sum [Number of Records], because I didn't include a unique ID field for each respondent in my data. (edit: technically a non-distinct count of [Favorite Fruit] would also work, but that would have different behavior if the data contained nulls). More robust data, like yours, has such a field:
Respondent ID Favorite Fruit Number of Records (generated) 1 Apple 1 2 Apple 1 3 Banana 1 4 Kiwi 1
Now I can count distinct [Respondent ID], and I'll get the same result as summing [Number of Records]. I don't even need [Number of Records]. The reason this is more robust is that the first solution makes an assumption that each row is a unique respondent. That assumption becomes false after pivoting, joining, and a number of other data transformations. For example, let's join the results of another poll of the same respondents, but this time they were allowed to select multiple options:
Respondent ID Favorite Animal(s) Number of Records (generated) 1 Dog 1 1 Cat 1 2 Iguana 1 3 Snake 1 3 Gecko 1 3 Dog 1 4 Cat 1
Joined to the first poll:
Respondent ID Favorite Animal(s) Favorite Fruit Number of Records (generated) 1 Dog Apple 1 1 Cat Apple 1 2 Iguana Apple 1 3 Snake Banana 1 3 Gecko Banana 1 3 Dog Banana 1 4 Cat Kiwi 1
Now sum the number of records for which [Favorite Fruit] is Banana. You'll get 3, because three rows have [Favorite Fruit] = "Banana". But all three rows are associated with Respondent #3, and what we really want to know is how many respondents' favorite fruits are Banana. The correct answer to that question is 1, which we get if we count distinct [Respondent ID] for which [Favorite Fruit] = "Banana".
[Number of Records] is the easiest way to count things when first starting to use Tableau, and users can be lulled into a sense that it is a bit magic. Specifically, it can seem like [Number of Records] counts exactly the thing you want to count, whatever that may be. It really counts one and only one thing: number of rows. Don't restructure your data, just shift to using "count distinct" of ID fields and you'll see more robust results when transforming your data.
What a fantastic explanation Thank you for taking the time to teach me and any other newbies that stumble onto the thread! Enjoy the remainder of your weekend and happy vizzing to you as well!