I'm sure it's possible to do what you want - I'll take a look when I get a chance.
I remember posting an example for someone else a while back demonstrating a few different ways to do the sort of ranking you are looking for - but I just spent a while getting frustrated with the forum search and didn't manage to drag it up again. I'll see if I can hunt down the workbook on my laptop...
I managed to hunt out the example workbook I posted back in that old thread I referred to (but still can't find the old thread). I've attached the workbook for you to have a look at, that includes various examples of ways to calculate and visualise rankings.
There isn't actually a sheet which displays the rank for every row, but if you look at sheet [Table Calc Quantile Rank] (for example) you can see that I am calculating the rankings there - though in this case I'm filtering the view so that it only shows one row for each rank, which can give a very effective "heatmap" type view - especially if you change the parameter to show finer-grained rankings (I've got it set up to do medians, quantiles, quintiles, deciles and percentiles).
I'll take a look at your example data and see if I can put a quick example together with that if I get a chance.
Quintiles_Help_RL3.twbx.zip 6.0 MB
I had a quick look at what you've got.
A few things I spotted.
[p] in the calculation needs to be a real number between 0 and 1 (referred to as the "p-quantile" in the definition on Wikipedia which I used when I was putting this together). So for example, the value 0.9 would represent the 9th decile or the 90th percentile.
You can't just take the first and last lines of the calculation away and expect it still to work. The whole calculation assumes that it is only operating on the first row in each partition (but using window calculations across the whole partition). As the result is an aggregate over the partition, the result is the same for all rows. I explained how to duplicate the result into each row if you want that in the immediate follow-up posts.
I actually suspect that this isn't quite what you want. I think you just want the quantile rank for each row ( as opposed to the nth quantile for the partition, which is what this is giving you). There is an example formula for that in the workbook I posted earlier.
First off, thanks much Richard for your time spent working on this!
In terms of my modification of the calculated field, I apologize - I thought that in reading through the thread, there was mention that removal of the first/last line was all that was needed to make the calculation apply to each row. Apparently I did not follow the thread nearly as well as I thought.
You are correct in that I want the quantile rank for each row, so that I will always be able to tell dynamically into which quantile a row falls based on what the filters and visible columns/rows are chosen. This will then allow me to group by quantile and do my calculations.
I will take a look at the example files you posted, though I suspect I will still need some help. Thanks again!
Message was edited by: Jay Chang UPDATE: OK, so having spent about 45 minutes this morning trying to figure out what is happening, I have to raise my hand and say "no understand, please help!". The quantile rank formula itself is very simple, as is the quantile value calculation. but I'm not sure how they're working and my attempt to integrate them into my file is not working as I expected. What I was hoping would happen was that each EE would have been put into a quantile based on their percent target value and then I would be able to group based on quantile so that I could get an average pct target for each quantile for each region for each performance period. But I cannot seem to get this result using the formula. Simplistically, at a minimum, if I were to look at a list of all employees within one performance period and one region, showing their percent target and their quantile value (using a quantile value of 4), I would expect to see a 0-3 for each employee. but that's not what I seem to be getting. What am I missing?
Right - I see what you were trying to do by removing that outer IF block - and I looked back at the thread and see that I did say that in the original posting. You have to read on a bit to see my corrections (it's a shame the forums don't let you edit the original post - it would be much better in cases like this to be able to have a single article that got it right, rather than everyone having to follow a trail of corrections.
I'll try to find some time to have another look at your example over the weekend and see if I can put something together for you.
I had a quick go at it and I think the attached is something like what you are after.
I noticed that my ranking formula didn't handle large numbers of duplicate values very intuitively. So as you have over 20% of your employees with PctTarget of zero, some of those were shown in decile 0, some in decile 1 and some in decile 2.
So I added a different variant which gives the same rank to all rows with the same value. I didn't check this carefully - but it looked more plausible.
I also noticed that the formula I used is sort-of equivalent to the Excel inclusive variant (PERCENTILERANK.INC) - which includes the end point of the range in the rankings. (Excel has .INC and .EXC variants which give rankings of 0.000 to 1.000 or 0.000 to 0.999. My formula was giving deciles of 0 to 10 and percentiles of 0 to 100.
So I added an exclusive variant, too.
The trickiest bit with these (with table calculations in general) is getting the partitioning right. So drop down the menu from the pills on the Rows shelf and take a good look at how I set that up. Here's an example.
Let me know how you get on.
SampleForPost_RL.twbx.zip 87.1 KB
The PERCENTRANK.INC function is indeed what I am trying to recreate. However, I was hoping that Tableau would allow me to calculate that more dynamically. Right now, I can calculate a rank in Excel and use it in Tableau but it's limited to whatever criteria I set up in Excel. So the quartiling works fine when I am using PERCENTRANK within each performance period but when I want to rank within Region within Perf Period, I have to either set up a new set of array formulaas in Excel or I have to figure out a way to do it in Tableau. I'm hoping that this approach will give me the additional flexibility I'm looking for.
I will look over your solution as soon as possible. Thank you so much for your help.
You should just be able to drag other dimensions onto the shelves and it should all still work.
Pick how many tiles you want with the Quantile parameter (I've set it up with 2, 4, 5, 10 or 100 as options, but you can add what you like to the list of values for the parameter.
I compared the results of my formulae with what the Excel functions give and whilst the ordering is right and the answers are close I don't get exactly the same answers - especially at higher precision. For example if you set mine to use 1000 tiles and do the ranking by performance period within region (for which there are less than 1,000 samples per partition), my rankings start at 2 and go up to 999 or 1000 while Excel goes from 0.000 to 0.999 or 1.000.
It's probably debatable which is more "correct". I'm pretty sure I took my calculation from a Wikipedia article originally - though the "inclusive" and "exclusive" variants were just quick tweaks earlier today without too much thought, so there could easily be some woolly thinking in those. It depends who you want to put your faith in - Microsoft or Wikipedia plus my tweaks. Take your pick.
Depending what you're using it for it may not matter at all, of course. I'm just flagging that you shouldn't take my stuff as gospel. If you do want to align it exactly with Excel's definitions it should "just" be a matter of thinking about the boundary conditions carefully.
First off, thanks again for all your help and time on this.
I'll have to be honest - I just don't understand how the functions you wrote work. I read the Wikipedia entry on both quantile and percent rank and the description doesn't make sense, at least not to me. In reviewing the formulas you wrote, they basically seem to compare each value to a prior value. The formula is simple enough - but I don't understand how it works, especially given the situation where I don't have my data sorted in any particular way. I tried to verify your results in Excel by calculating the various quartile break points and then comparing each value to the various quartiles (using Excel's QUARTILE, QUARTILE.INC and QUARTILE.EXC). Your results matched my random sampling of comparison points - I'm sure if I did an exhaustive comparison, I could find exceptions, but your results are certainly close enough for hand grenades.
I did notice your point about as the number of observations increase, the drift from Excel's results differ - when I use the entire set of performance periods to do my quartiling, I found discrepancies between your formula and Excel's pretty quickly while when I limit to within a performance period, the discrepancies drop.
I am also not sure how/if you are using the n-tiles field and the p variable - they don't seem referenced anywhere in your other formulas or used in the views but maybe I just don't understand the overall structure.
Finally, how would I go about creating groups using the quantile rank measure? It does not seem to behave the way I expect it to given that it is a measure. For example, if I wanted to calculate average percent target per quantile rank, I can't seem to figure out how to do this.
Yes, table calculations are definitely difficult to understand.
I'm not sure I'm going to be able to explain everything for you, but here are a couple of pointers which might help.
You say: "especially given the situation where I don't have my data sorted in any particular way". The key point here is that the definition of the table calculation partitioning does include a definition of the sort order. If you look earlier in the thread I included a screenshot of the partitioning definition I was using. That includes an "order along" entry. That is the order in which the rows are processed for the purposes of the table calculation, and bears no relation to the display order.
On the n-tiles and p question: they were just in your original workbook and I didn't touch them, but I'm not referencing either of those in any way in what I have done.
And finally, I don't think you can group by the results of a table calculation like that (but if anyone has figured out a way, please do tell).
Table calculations are definitely more difficult to learn than they could be. I know the Tableau developers are aware of that issue, and I'm personally hoping to see some improvements in future releases, though I have no insight into timeframes, just wishes.
To answer your last question about grouping table calculation results, what you're looking for is the ability for Tableau to partition on the result of table calculations (the quantile rank) in order to calculate an aggregation (average percent target) over the partition. Tableau doesn't do that (it is possible to partition by the results of regular aggregates, but not table calcs), however there is a workaround in some cases where you can build a set of calculations that effectively create a partition and then evaluate that. Joe Mako came up with the idea in these two forum posts:
I also did a flavor of that in this post: http://community.tableau.com/thread/120818, though I made it way more work than really needed to get the desired result, as Richard pointed out.
I've been (slowly) working on a blog post to come up with a more generalized approach, and it's tricky, and of limited use because a) you have to really know table calculations and b) won't work in all situations. I'm not sure if it will work in yours, but if you'd like, I can try.
Gah! I knew I wasn't following this thread in its entirity and understanding everything that was going on.
Richard, unfortunately, the wonderful work you did on my behalf is negated by Tableau's inability to group the results of table calcs into partitions. The main reason I was trying to quantile in Tableau was so I could do dynamic averages by quantile. The inability of Tableau to do this means that the quantile rank, while nice, does not buy me much coffee with my bosses.
Jonathan, I will look through the links you posted. I am loathe to ask people to spend more time on my little problem, especially when it seems that the answer will still not be as stable as I'd like it to be.
I am attending the Tableau conference in San Diego - perhaps I will raise this issue with the developers. The ironic thing is that Tableau has quantiling built in - they have to because they allow me to add reference lines by quantile. But the inability to access that functionality in any other way is quite annoying.
Out of curiousity, @Jonathan and @Richard - how did you guys come by your mastery of table calcs? I cannot even begin to follow a lot of the table calc logic in some of these forum posts. At least with Excel, I can pretty much figure out and re-create most of the stuff I find online and apply it to my specific needs. But with Tableau, I can't seem to do it. Are there courses/web vids that will help?
I'm not even going to begin asking Joe Mako how he achieved his Yoda-like mastery. I'd like to get to pre-padawan status first. :-)
Here's a list of most helpful resources in learning table calcs that I put together recently:
It's like the old joke about how do you get to Carnegie Hall - practice, practice, practice. When I was starting to learn Tableau last year, I would very consciously take time to go through the workbooks that Joe, Richard, and others had put together and figure out how they were done, very often re-building the workbook myself. Joe also did some webex's with me that were tremendously helpful, and I'm happy to do them too if you like. Some people build boats inside glass bottles for a hobby, I dive into thorny Tableau problems.
Beyond that, there's a skill/talent that I don't know how to teach, exactly, in being able to abstractly conceptualize how the calculations are interacting with each other. The way I do that is to start with seeing the "canvas" of the raw data (what you see when you View Data from the data source) in my mind's eye, a final canvas of the desired view, and then I internally visualize the in-between calculations as additional canvas(es). Since the addressing, sorting, and partitioning of each table calculation can be different, this can get complicated.
Finally, the last bit that's taken me to another level with table calcs and Tableau in general (and the two links I posted earlier on partitioning by table calc results were both part of that for me) is to stop thinking about "what pills do I need to make the view I want", and instead to really understand the goal, then think about how the data would best be organized to meet that goal given my knowledge of Tableau (and that's where the learning from practice really comes in to play), and only then start thinking about what combinations of pills, calculated fields, reference lines, annotations, etc. will get me to the goal. When the data is properly set up (either through reshaping or calculations in Tableau), the views come together pretty easily.
I'll also be at TCC and hosting the Community Forum roundtable, I hope to see you there!
PS: I think I have a workable solution for your averaging problem, I'll probably get it up sometime tomorrow.