1 2 Previous Next 17 Replies Latest reply on May 23, 2015 5:35 AM by Jonathan Drummey

    Histogram via Size()

    Alexander Mou

      Vizible Difference: Histogram via Size()

      Size() is the easiest table calculation for creating histogram! There are multiple options.

        • 1. Re: Histogram via Size()
          Shawn Wallwork

          Interesting article Alexander. Are you still on 8.3? When you upgrade to 9.0 you'll be in Histogram heaven! (And have to write your 6th Histogram article. )


          Now in 9.0 we can bin aggregated dimensions without having to deal with table calcs at all. Using a FIXED LOD expression will do the trick:


          {FIXED [Customer Name] : COUNTD([OrderID])


          Then create a bin from this with a size of 1 and voila:






          1 of 1 people found this helpful
          • 2. Re: Histogram via Size()
            Jonathan Drummey

            I just responded on the blog, I agree that SIZE() is the easiest table calc *and* I think with version 9 now out in the wild that LOD expressions are a big improvement.



            • 3. Re: Histogram via Size()
              Alexander Mou

              Thanks Shawn and Jonathan for the replies.

              Now I am in transition to 9. I realize how LOD is powerful with each day passing. And I know it can be applied to calculating histogram.

              For the sake of comparison, the solution via Size() requires a little less coding than using LOD. And an earlier solution requires no coding at all. Mouse does all the work. LOD is not better in every aspect. And LOD may not be the default solution even when it's available.


              The purpose of the article is mainly for a better understanding of Size() function and the other elements such as marks, ignore in table calculation etc.


              I believe that the best approach is decided by knowing all the options you have.


              BTW, be pitiful to the people still living in the 8th world. There are still many.

              • 4. Re: Histogram via Size()
                Shawn Wallwork

                Fair enough. Will you be approving Jonathan's comments on your blog?


                If not, Jonathan would you post them here. Thanks.



                • 5. Re: Histogram via Size()
                  Alexander Mou

                  I do not see any comments waiting for approval at all.

                  What happened?


                  On Fri, May 22, 2015 at 10:17 AM, Shawn Wallwork <

                  • 6. Re: Histogram via Size()
                    Jonathan Drummey

                    I'm not sure what happened with my comment, I hit submit right before running off to a meeting...I'll try to regenerate my comments here from memory:


                    While I agree that SIZE() is the easiest table calc, I think LOD expressions are an improvement in many ways:


                    1) This is the big one: The histograms that we create using SIZE() or any of the other aggregate techniques aren't accurate when the data is sparse and we're using discrete bins, which is a very common situation. This can be seen in this screenshot from your post:

                    2015-05-20 16_18_35-.png


                    In the case of the screenshot I copied, the bins at the right are 35 and 41 when there should be three 0 bin spaces between 30 and 35 and four more between 35 and 41. Also, in Shawn's post above, bins for values 14,15, and 16 aren't displayed. To be totally technical, these kinds of charts don't fit the true definition of histograms because they aren't showing every bin.


                    There's a trick to making this accurate with aggregate or table-calc based bins by using a continuous pill instead of a discrete to show the bins, but that doesn't actually let you show mark labels for bins that have 0 elements (i.e. no data). See http://public.tableau.com/views/TheNextNTableCalculations/11_HowManyofXDidHowMuchofofY?:embed=y&:showTabs=y&:display_cou… for an example, which also shows yet another technique for generating histograms using the Ignore in Table Calculations option. An LOD-based bin can have Show Missing Values turn on, so Tableau can pad out the bin for discrete headers so elements are accurately spaced *and* we can show 0s.


                    2) In very many cases Level of Detail expressions let us "flatten" calculations from being aggregates or table calcs into record-level calculations. This drastically improves ease of use, the bins for LOD-based histograms can be used in other calculations without the problems that aggregates & table calcs introduce with needing to keep track of the dimensions in the view, addressing & partitioning, densification, etc. (That said, LOD expressions aren't perfect either).


                    3) Filtering scenarios for LOD-based bins are much simpler than with table calcs.


                    4) Both aggregate and table calc-based bins are harder to use in overview & detail dashboards, Joshua Milligan has covered the challenges and how LOD makes life easier in Slicing by Aggregate | VizPainter and My Favorite Tableau 9.0 Feature | VizPainter.



                    2 of 2 people found this helpful
                    • 7. Re: Histogram via Size()
                      Alexander Mou

                      I did notice the issue which deviates a bit from the definition of histogram. For histogram on bin size >1, which is often the case, this is a lesser problem.


                      Tool wise, I agree that it would be great if Tableau let user turn on missing values in dimensions based on aggregates.

                      • 8. Re: Histogram via Size()
                        Shawn Wallwork

                        Yep, moving too fast forgot to turn them on.



                        • 9. Re: Histogram via Size()
                          Jim Wahl

                          Another advantage of LOD is that they force you to explicitly think about and set the LOD.


                          Alexander's blog post says that the histogram shows, "number of customers per number of orders." But it's really showing the orders * items per customer, since the data source is at the invoice-line-item level of detail.

                          If a customer ordered 4 items in a single order, there will be four rows in the data source, and your histogram would put him in the 4 bin.

                          Jonathan Drummey I don't see the Show Missing Values option in the fixed LOD calc in Shawn's example?

                          1 of 1 people found this helpful
                          • 10. Re: Histogram via Size()
                            Jim Wahl

                            Somehow your message got posted before mine, but if you're referring to Show Missing Values --- I don't see the option when I click on the pill?

                            • 11. Re: Histogram via Size()
                              Alexander Mou


                              You are correct!


                              Sum(Number of records) is for items. CountD(Order ID) is different.




                              On Fri, May 22, 2015 at 1:02 PM, Jim Wahl <

                              • 12. Re: Histogram via Size()
                                Jim Wahl

                                Never mind -- I wasn't using the binning the Count of Orders / Custom calc. By creating a bin calc, it works. Makes sense.

                                • 13. Re: Histogram via Size()
                                  Alexander Mou

                                  Tried ZN(Size()) with continuous columns.

                                  It doesn't seem to make zero show up.

                                  • 14. Re: Histogram via Size()
                                    Jim Wahl

                                    ZN() needs to be around your SUM(Number of Records) field on the Columns shelf.

                                    1 2 Previous Next