1 2 Previous Next 21 Replies Latest reply on Oct 17, 2013 9:45 AM by Jonathan Drummey

    How to find a frequency of an aggregate

    Ayush Baheti

      My data was something like this:


      ID          Infection-type

      1            URI

      1            URI

      1            UTI

      2            Flu

      2            UTI


      I made a calculated field "Infection count" which has the formula - COUNT([Infection-type])

      The graph below is of AGG(Infection count) vs ID



      I want to find out the COUNT of the number of IDs which were infected "n' times.

      In the example, 3 people were infected 13 times, 3 people were infected 11 times, 4 were infected 9 times.


      Any ideas?

        • 1. Re: How to find a frequency of an aggregate
          Shawn Wallwork

          Ayush, I think the attached file should give you the count you're looking for. It set the partition using a discrete copy of the Infection Count, then uses a table calculation SIZE() ordered along ID. Let me know if this works for you.



          1 of 1 people found this helpful
          • 2. Re: How to find a frequency of an aggregate
            Jonathan Drummey

            Here's an extension of what Shawn did, you can use an IF FIRST()==0 THEN SIZE() END to reduce the number of marks and create a bar chart or a text table. Note that the text table does not include 10, as no patient had 10 infections.



            1 of 1 people found this helpful
            • 3. Re: How to find a frequency of an aggregate
              Ayush Baheti

              I understood what you're trying to do but was unable to replicate the same results for my data. I'm attaching a sample data. Can you please send me a graph where:


              X-axis = Times infected (1 time or 2 times or 3 times)

              Y-Axis = Count of unique people (Unique ID) infected


              My problem:


              Your calculated column description reads "Results are computed along ID for each Infection Count."

              On the other hand, mine reads "Results are computed along ID."


              Thanks for the reply. Couldn't reply soon as I was out of town.

              • 4. Re: How to find a frequency of an aggregate
                Jonathan Drummey



                See the attached. The key bit is the partition setting that Shawn mentioned, you need to have a discrete version of the # of infections calc in the view (a blue pill) and uncheck "Ignore in Table Calculations" for that pill. That allows Tableau to partition the results as desired.


                Also, note that the view is counting every infection for each patient, if you wanted to just count the distinct infections then you'd need to use an extract or a data source that supports the COUNTD() operation (which MS JET does not).



                • 5. Re: How to find a frequency of an aggregate
                  Jeff Chen



                  This is my first post, and I'm relatively new to Tableau and deep Excel, so forgive the ignorance on how this works. I had a very similar question for a visualization I was working on, and was able to adapt it successfully. However, I'm a little fuzzy on the logic used for this.


                  As I understand it, it's using "IF FIRST()==0 THEN SIZE() END" to determine the maximum of something along patient ID. How does this work, and why do you need to uncheck "ignore in Table Calculations" for this to work?


                  As a secondary question, once you have this visualization, can you run additional calculations on this data, as the underlying rows do not reflect the visualization?


                  – Jeff

                  • 6. Re: How to find a frequency of an aggregate
                    Jonathan Drummey

                    Hi Jeff,


                    You're asking somewhat complicated questions, I'm going to start with the building blocks and go from there so nobody gets lost along the way.


                    A key thing to keep in mind is that Excel is cell-based, things that categorize data like data types, column names, range names, etc. are all bolted on to that model. So we can give any function an arbitrary range and Excel will try to do something with it. Tableau starts from the data, and distinctions like data type, dimension and measure have very real impact on how Tableau generates the queries to the data source and lays out the view.


                    Let's start with the underlying data, which has a row for each combination of patient and type of infection.


                    Then there's the "table" of data created by the view, which is dependent on the pills on Columns, Rows, Pages, and Marks Card, where the data values are determined by the dimensions (they create GROUP BY's in the query that Tableau issues to the data source) and aggregations used for the measures. You can actually have multiple "tables" at this level, there's one for each independent Marks Card which can each have their own different level of detail based on the pills on each Card.


                    Then there's table calculations, which are "tables" that are based upon the addressing and partitioning settings of the table calculations and only "exist" inside of Tableau's computational engine, and can have totally different grouping and ordering than what's in the view. And we can nest table calculations to get even more results.


                    Ordinarily, only dimensions in the view are available for addressing (aka Compute Using) and partitioning of table calculations. A key feature in Tableau is that any dimensions that are not used for addressing are automatically used for partitioning, this lets us drag and drop pills for table calculations and see a variety of results without ever having to change the original calculation. The results of measures are not available for use in partitioning table calculations, unless we explicitly have a discrete measure and turn off Ignore in Table Calculations. I'm guessing that this is off by default because it would make for some really strange results for the most common use cases for most users. Pretty much the only time I ever need it is for these kinds of frequency counts. Also, the results of measures are not available for addressing table calculations, though in some cases I can imagine a workaround for that.


                    In this view, the goal is to show a count of the # of Patients who had 1 infection, # of Patients who had two infections etc.


                    The # of Infections per patient is calculated here as SUM(Number of Records). That's an aggregate measure. For this measure to return accurate results per patient, the patient Number dimension also needs to be in the view.


                    Next we want to count how many patients had that # of infections. So we need to partition by the # of infections (so we can restart the count of patients with each new # of infections). To count up the number of patients, there are a variety of different methods, in this particular case SIZE() works because the patient Number is a) in the view and b) we don't want to partition on it. So by addressing on patient Number, we partition on the # of Infections, get the number of patients in each partition, and we have a usable number.


                    However, SIZE() returns a value for every row in the partition, i.e. patient Numbers in this case, so if we just use SIZE()in the bar chart then the values would be incorrect. For this to work, we only want to return one value per partition, and that's what the IF FIRST()==0 does. FIRST() returns the offset from the current row to the first row in the partition, where 0 is the first row in the partition. So IF FIRST()==0 only evaluates to True for the first row in the partition, and SIZE() is only computed once for each # of Infections.


                    In the workbook I'd posted above, you'll see the "45 nulls" warning in the lower left. This is from all the values of patient Number that have # of Patients returning Null. Usually what I do in that case is Ctrl+drag a copy of the table calc pill with the IF FIRST()==0 (# of Patients in this case) to the Filters Shelf and set it to Filter for Special->Non-Null Values so it a) retains the Compute Using settings, b) gets rid of the warning message, and c) speeds up Tableau when there are a lot of marks. (This is less of an issue in Tableau v8, where they made some real performance gains in both table calculation computation and view rendering).


                    You can use these results for further computations, but you have to keep the level of detail of the view in mind for the other calculations, depending on what you want that can be somewhat straightforward to quite difficult to deal with.


                    Does this answer your questions?



                    • 7. Re: How to find a frequency of an aggregate
                      Shawn Wallwork

                      There's going to be a book soon, right Jonathan? Incredible stuff -- well explained. I'm thinking the book should be:


                      "The Collected Works of Richard, Jonathan & Joe Regarding Tableau"


                      [you guys can negotiate the name order ]


                      Actually, on second thought, maybe this is a Dustin or Tracy project, collecting together the best posts from these three most forum-prolific Zen Masters. Just a thought.



                      • 8. Re: How to find a frequency of an aggregate
                        Jonathan Drummey

                        Thanks, Shawn! I thought this one turned pretty well, too, though the real question is whether it works for Jeff!

                        • 9. Re: How to find a frequency of an aggregate
                          Jeff Chen

                          Thanks guys! I think I understand it now. Let me work through this process and let me see if I have this straight:


                          EDIT: I'm somewhat right for the first couple steps, but then I go off track. I think I have a better idea of what's going on about two steps down...


                          # 1: Setting up Partitions and Addressing

                          Step 1.png

                          This setup allows # of Records to be used as a Dimension, and is now available for table calculation.


                          # 2: Table Organization


                          This is now how Tableau is parsing the table. Orange is addressing, blue is partitioning.


                          Step 2.png


                          #3: SIZE()


                          If you just use SIZE() as your table calculation, then essentially, what you're doing is this:


                          Step 3.png

                          #4: FIRST()==0


                          But if you use the FIRST()==0 function, then this is what happens:

                          Step 4.png

                          Am I missing anything, or is this right? Using the # of Records as a partition is a little difficult to wrap my head around. Are there any other interesting partition tricks?


                          Message was edited by: Jeff Chen

                          • 10. Re: How to find a frequency of an aggregate
                            Jeff Chen

                            Actually, I don't think this is quite right. Let me think on this more...

                            • 11. Re: How to find a frequency of an aggregate
                              Jeff Chen

                              OK, I think I'm understanding it now. So let's skip right to the data:


                              #1: Partitions


                              Step 2a.png

                              This is how the data is partitioned.


                              #2: Parsing

                              Step 3a.png

                              This is how Tableau is parsing the data. Blue is the partition. Orange are addressing fields.


                              #3: SIZE()


                              If you just use the SIZE function alone, then what Tableau sees is:

                              Step 4a.png

                              So a graph of just SIZE() alone would have 14 records of 14 each for patients with 1 infection.


                              #4: IF FIRST()==0 THEN SIZE()


                              But if you add the FIRST()==0 function, you get:

                              Step 5.png

                              This seems to conform to the behavior of the visualization in the packaged workbook. Would this be an accurate explanation? If I'm wrong, let me know what I'm missing. Love the program, but still trying to learn the concepts...

                              • 12. Re: How to find a frequency of an aggregate
                                Jonathan Drummey

                                Hi Jeff,


                                I'm really glad you want to understand one of my favorite features of Tableau, this is fun for me to try to explain what is going on and figure out what's most helpful to you (and hopefully others). Thanks for making the visuals, there are a few different missing bits and misunderstandings in what you posted, here's what I noticed:


                                In the first post:

                                - #1 is correct except for the dashed line pointing to the continuous (green) # of Infections. That pill has no effect on partitioning, it's the discrete version of the pill that does.

                                - In #2, you had the Number (the addressing field) on the left and the # of Infections (partitioning field) on the right. Though you were correctly identifying the partitioning, it's a lot easier to understand if we swap the pills. I set that up in a graphic below.

                                - I don't understand the logic in #3 and #4.


                                In the second post:

                                - You left the addressing out of the picture in #1. Keeping track of your addressing and partitioning is critical to understanding table calculation results.

                                - #2, 3, and 4 have the # of Records listed as addressing, which is incorrect (though gets you towards same result). I think this happened for two reasons: First, the dimension that is used for addressing is Number, which sounds enough like Number of Records to be confused. Second that there is only one row in the database per addressing row when we have the Type of Infection in the view (which I believe you had in the second post), so the sum of Number of Records has the same result as SIZE() in this case.


                                Here's my alternative graphic:


                                partitioning and addressing graphic.png

                                I had an "Aha!" moment while writing this about how to describe addressing: Tableau creates an address (also called a "row in the partition") for each distinct combination of value(s) of the addressing dimension(s) within each partition, and computes a result for the table calculation based on those values. The WINDOW_***, RUNNING_***, and LOOKUP() table calculations can also take offsets to make use of other rows in the partition.


                                Some other points to remember:

                                • Tableau addresses table calculations on dimensions. Tableau partitions calculations on dimensions and discrete measures that have Ignore in Table Calculations turned off.
                                • Dimensions that are not used for addressing are automatically part of partitioning.
                                • A row in a partition for a table calculation is made up of the unique combination of value(s) of the addressing dimension(s), in other words the address created by the intersection of values of the addressing dimensions.
                                • The Number of Records measure aka SUM(Number of Records) reflects the number of rows being returned from the data source, which can be very different from the number of rows in the partition. Table calculations like SIZE(), INDEX(), FIRST(), and LAST() are available to give you information about the partition.


                                There's loads more to the topic of table calculations, I'm still learning new things almost every day.


                                Hope this helps!



                                • 13. Re: How to find a frequency of an aggregate
                                  Jeff Chen



                                  Thanks! That was super helpful. I think what was making my head hurt was that the partitioning dimension is generated, and doesn't already exist in the table. Your visual diagram helps out, although it will still take me time to get used to the idea.


                                  Just one more question, if I may - does a dimension need to be in the view to partition?


                                  Thanks again!


                                  • 14. Re: How to find a frequency of an aggregate
                                    Jonathan Drummey

                                    Hi Jeff,


                                    Sorry I'd missed your question. Yes, a dimension does need to be "in the view" to partition, but that can be anywhere on Rows, Columns, Pages, or the Marks Card.



                                    1 2 Previous Next