If you put PersonID on your sheet and then drag a measure to display, Tableau does SUM([that measure]) by default. So if you have multiple rows for that PersonID, you'll get a sum of the values for that measure for that person.
Sometimes you want this. Sometimes not. If the values in the rows for that multi-row PersonID are all the same and you only want one "count" of it, you can change the aggregation to AVG([that measure]) or MIN or MAX.
And if you are counting PersonIDs. you can use the COUNTD function rather tan COUNT. The "D" stands for DISTINCT, and will give a count of 1 for multiple occurrences of the item you are counting.
Thank you for your answer.
indeed, when the data is numerical, that's the way to go. Forgot to add we are using dates a lot to keep track of in- and outflow of patients - and this is where problems start.
For example, if a date of completion of a health review does show in one entry but does not in the other, when compiling waiting lists, we end up over-counting. I am not sure if this gives you an idea of the range of possible problems, but combined with commonplace redefining events, with deep data cleansing and with tight deadlines, it boils down to quite a variety of errors.
An obvious solution to the above example of over-counting is to list cases rather than to aggregate them, and to cross-check the outputs. At the end of the day, you also need to keep special lists to be joined, and to remember the names just in case.
Far from the chance to make full use of tableau and how it automates data handling, let alone producing consistent packs for the national data warehouse.
Naturally, we are working hard at many levels to get this rectified, however, until we have succeeded, which is probably still a matter of months, I would greatly appreciate some tips on how to tame such a data set.
I think it's time to provide an anonymized workbook with anonymized data so that I can stop guessing and answer specifically.
Follow the steps in the video at this link:
I did an example in the video, and it took about 10 minutes.
I don't need ALL your data. Just enough to demonstrate the problem you are facing. Maybe just two patients with records representative of the problem. And if you anonymize names and other personal info, you should be able to share a sample workbook here. In fact, you can toss out most of the columns that don't really pertain to the issue you're having. Just fake IDs, rows with dates (and nulls), and any other columns that are needed to demo the issue.
Fair enough, please see attached file. I hope I have not overanonymised it
If you look at the data source
there are three patients but one of them has two records.
As you can see in the tableau, it properly counts
the total number: 3
the number of cases started: 2
the number of completed: 1
but it overcounts the started but not completed.
Clearly, it disregards the fact that patient 1 has another record that confirms the completion.
This is the root cause of the problems we are getting.
Thank you for your help.
anonymised packaged 2.twbx 18.1 KB
(V 10.5 here)
When you have multiple rows per ID, you can yank out a single value from among all the rows using LOD calcs.
In the attached on Sheet 5 we see the duplicate PID for "1".
On Sheet 5(2), I have made two FIXED LODs. The syntax says to get a single value for each PID for the specified date. In this case, for PID = 1, even though one of the rows does not have an actual eligibility date in the data source, this calc sets the eligibility date for all rows for that PID. Using this calc field instead of the actual data source field will let you do better counting.
LODs were introduced in version 9.0, and it's something that intimidates a lot of users. But it's a powerful feature that is worth the effort to master.
anonymised packaged 2_v10.5.twbx 26.3 KB
That looks really amazing, thank you.
Off topic, I think I have today tested a brute-force equivalent method on a different data set.
if ATTR(right ([Target], 1))='%' then attr([Target]) else (str(sum(
if (right ([Target], 1))='%' then 0 else (int([Target])) END
)))END), -2, 0)
In essence after having made target figures coming as text get aggregated at LOD quarter-of-the-year up, I found out that quarters consisting of one month's value are put in a separate row, which produced two values rather than one. The wrapping of the above code applies max on aggregation, an action which tableau normally rejects. This would not be replaced by a fixed LOD command, would it?
Thanks a lot for your priceless help. I will test the method before approving.
...The wrapping of the above code applies max on aggregation, an action which tableau normally rejects. This would not be replaced by a fixed LOD command, would it?
The code you gave uses WINDOW_max. WINDOW_xxx functions (which are table calcs) are what you use to further aggregate aggregates.
In some ways, (in many ways), LODs replace table calcs, but they can't always do so. And once you have table calcs in the mix, you preclude the use of LODs.
But I think you can do what you've done above with LODs. I'd have to dig into it, and I'm short on time today.
OK, I will test fixed LOD on that as well.
On the multiple rows per ID it worked perfectly. Many thanks for your help.