In your supplied file, you are using a text file as a data source, which doesn't support the Median function (I'd only known Access and Excel didn't support this). By using a data extract of the file, you can use Tableau's Median function on the Duration field and then set up a line chart for the Date. See the attached for details.
If you don't want to use an extract, here are a couple of other options for calculating a median that can be plotted:
connectmedian_jtd_edit.twbx 60.3 KB
Thanks for your reply. The actual data comes from a database so the text file limitations aren't an issue.
However, I can't use the Tableau Median function as you can see if you compare your plot and mine (they're different). I guess you missed one part of my data: It's already aggregated over different durations and the Count measure tells you how many events of given duration exist for each date. Tableau's median cannot handle this case (and this has been a topic of million forum posts here).
The RAWSQL route might work otherwise, but I need to have the median dynamic, based on quick filters. Those can't be done with RAWSQL as far as I know.
Any other ideas?
I've (coincidentally) run into a somewhat related problem where Tableau is not calculating avg, stdev, etc. properly for a counted measure, so I worked on yours for awhile to see if that would give me any insight, and I learned some things, and I'm stuck.
What I can see going on is the following:
The "Median duration" calculation is returning more than one distinct value for each Day (one per Duration). You can see this on the "Calc One Median (by Day)" worksheet, with all the values on the "Calc One Median (Day, Duration)" worksheet. There are a few effects from this:
- In the Basic worksheet Tableau is drawing a line from one Day with a given duration to the next Day (not necessarily sequential days) with the next highest duration. I figured this out by putting Day on the Color shelf and used a Quick Filter to add Days one by one.
- In the With Path worksheet, the Test Set is limiting results to specific Day/Median duration combinations, but there are still multiple results so there are overlapping points plotted (the circles are a little fuzzy).
- On the With Page worksheet, the Duration is still effectively in the level of detail because Tableau returns all values of the dimension on the Pages shelf, but since Duration is not in the view it's not changing anything? I'm confused on this one.
I believe that to get the line properly drawn that in the level of detail in the worksheet, Tableau needs to have each row (that could generate a point) have a combination of X&Y values that leads to the next point. So, if the level of detail is Day & Duration, then each returned Median duration has to go with the Day since Day is X and Median duration is Y.
My thought was then to figure out how to return a single median value for all rows in the given partition. I created a calculation that does that, but got stuck on trying to return that value to all rows. The calc is "Median Duration (return single value)", and you can see this on the Calculating One Median worksheet. When I plotted that on the "Plotting Single Value" worksheet, Tableau draws single, non-overlapping points (unlike the With Path worksheet), and I did get one bit of line where Median Duration was the same from one Day to the next. In Tableau 7.0 (which the revised worksheet is in), Tableau shows >11K nulls on this sheet so I think all those missing values are really needing a Median Duration value.
Hopefully, this might stimulate some ideas!
connectmedian_jtd_edit2.twbx 162.3 KB
See Sheet4 in the attached.
A few things going on here:
1) I made the Columns shelf continuous
2) ATTR(Day) on the Path shelf
3) Added calc field Index to filter down to one row per day for display (whilst still keeping all rows available to the table calculation)
See if that helps. I'll try to explain a bit more later if you want.
connectmedian_rl.twbx 57.9 KB
OK, Richard I'll bite .... 1e30 ????
Nevermind. I get it.
Hi Richard, thanks for this. I'm curious about how the INDEX function reduces rows, I'm guessing that it's because filters on table calcs are performed after everything else?
Also, the link you posted didn't work, was it something from this thread? http://www.tableausoftware.com/tcrl/table-calculation%3A-quantile
That was in Niko's original. Showing my age, that's like "HIGH VALUES" in COBOL.
It would probably be better just to use NULL - the WINDOW_MIN() will ignore NULLs.
BTW, to get a bit more background on what is going on here with cutting down the number of rows to simulate the effect of an aggregate function, have a read of how I described this in the general quantile function I posted to TCRL forum.
Yes exactly. Filters on table calculations are applied in Tableau after all table calculations have been evaluated. The median calculation needs all the rows and works out the median per partition (i.e. per day) and in this case gives the same answer for each row in the partition. The filter just says limit the display to just the first row in each partition (I set the partitioning on the [Index] field to be the same as for the median calculation). That just avoids drawing lots of marks on top of each other, which with lots of rows makes it quicker and also avoids some fuzziness in the display).
The link was supposed to be to version 2 of the quantile function (I fixed it above, too). They both do the same trick to limit the display down to a single row per partition - but the 2nd version is the one to look at - the function is much better and hopefully I explained it a bit better.
Hi Richard, I could use your (or anyone's) help now...I've learned a lot that's enabled me get the views I wanted to calculate close to being complete, and now I'm having a problem with the built-in reference line calculations and, in trying to work around that issue, calculating standard deviation.
The situation is that I'm trying to display the mean and standard deviation for all values of Hgb (hemoglobin count) on a view where the distribution of Hgb across tranfusions is shown. However, the built-in reference line calculations for mean and SD are calculated on the distinct values of Hgb, and not all values of Hgb. To try to get around that, I was working with the Level of Detail shelf, and got the reference line calculation to work, only it was returning overlapping values. Using the Index filter to remove overlapping values then breaks the built-in reference line calculations. So I created a table calculation to calculate the mean and SD. The mean calculation works, and the SD calculation works in one context (for all rows), but I can't figure out how to get it to work on the actual view where the Hgb value and Date are part of the context.
I've identified three questions relating to how means are calculated vis-a-vis standard deviation, which I've put into specific worksheets in the attached workbook. Here they are:
Question 1: I'm not sure why the TC Mean Hgb doesn't need to be calculated along ID to come up with an accurate result, while the TC STdev Hgb does?
#2a: The reference lines for Mean and Standard Deviation are now incorrect, they appear to be calculated after the Index filter is being applied, so instead of calculating across all rows they are only calculating for the displayed values of Hgb. Is it the case that reference lines are calculated after everything else, including filters based on table calculations?
#2b: Is there any way to change the reference line calculations so they ignore that Index filter? There doesn't seem to be any.
#3: Assuming that the built-in reference line calculations won't work, I tried to roll my own. The TC Mean Hgb field works fine, but the TC StDev Hgb calculation doesn't work. Even when I set the Compute along ID that worked in prior MV No Overlap worksheets, it's either showing 0 or no value. I'm guessing this has something to do with Hgb and Date being in the view now, but trying different combinations of Compute along either causes all values for TC Stdev Hgb to be empty or crashes Tableau (6.1). How would the TC StDev Hgb need to be set in order to come up with the correct value of 1.04?
The attached workbook has all the details.
index_stdev_and_ref_line.twbx 95.1 KB
Thanks for all your efforts!
Richard: your solution looks great, but can it be done without making columns continuous? That would make my life a lot easier. I can't understand why the line skips a few points if the columns are discrete.
The point of the index is, as I see it, just to prevent Tableau from drawing too many overlapping line in the figure. Am I right?
I'll have a look - I'm pretty sure it's possible to do everything you want.
> impact of discrete: Yes, I hit that while I was putting the example together, I almost commented on it. There's quite a bit of undocumented black-magic that goes on with table calculations. Joe is the person who understands it best (and he and I have spent many an hour trying to figure some of it out via shared desktop Skype sessions!). I suspect this might be due to something known as "padding the domain" where Tableau "fills in missing entries" in a tabular layout - though I'm not quite convinced about that. I'll have another look and see if I can convince myself what's happening. Better still, I'll see if I can make the lines draw properly with a discrete axis - but no promises.
> Index: Yes, that's exactly what it's for. Depending how much data you have you can actually go a bit further with that technique and only calculate the WINDOW_MIN() for the first row of the partition. That can make a huge difference to the performance of window_xxx() functions, but is a bit tricky - it's not just a matter of wrapping the window function call in an if () statement. I've written that up here if you're interested.
I had a bit more of a look at the discrete question. I haven't absolutely got my head around what is going on - but if you look at the tooltips for the points you'll see that the ones that do join up in the line all have a Duation of 0, whereas the ones that are not joined up have some other values (so the 1st row in the partition for that day wasn't a zero) - which means that Tableau doesn't treat them as being in the same series. There must be some magic to do with how the path shelf works which treats continuous and discrete differently.
I had a bit more of a play and managed to make it work as you want it. Basically I made both [Day] and [Duration] discrete and then moved [Duration] from Level of Detail to the Rows shelf. Having a discrete dimension on the rows shelf like that causes Tableau to "pad the domain" - so every day has a cell for every distinct value of [Duration] - though all the padded ones are NULL, so they don't affect the value of the calculations.
Now when the filter on [Index] retains the first [Duration] value it's the same (-5.000) for every day, so the points which are drawn are all in the same series and get joined up with a line.
Finally I hid the Duration field on the Rows shelf by right-clicking and de-selecting Show Header.
Interestingly, converting this to version 7, just converting [Duration] to discrete on the Level of Detail shelf had the same effect - so the black magic has evidently changed subtly - I hadn't spotted that before.
Not exactly point and click simplicity - but at least you can do it.
connectmedian_rl_v2.twbx 65.1 KB
I have not had a chance to read this thread yet, but after reading the the initial question that started the thread, attached is my initial thought. Is the attached what you are looking for?
here is what I did:
1. wrapped your formula in a
IF FIRST()==0 THEN ... ENDso the filtering of marks happens in the formula, and null is returned for the excess marks.
2. Set "Day" as a Continuous Dimension, (reasoning: Dimension for use as partitioning, and Continuous so Tableau can connect the dots)
3. As you did, I also placed Dimension Duration on the Level of Detail shelf as a discrete dimension, and used it for the Compute using for the calc field
4. placed a pill for ATTR(Day) on the Path shelf to help Tableau connect the dots.
When I get the chance I will read this thread more closely.
When dealing with table calculations, there are a great deal of factors to be aware of to get the desired evaluation and display.
Richard, reading you last comment about version 7, there is an issue I pointed out during the beta (I believe I emailed you on it) where the Line mark type causes very strange mark padding, and there is an issue with filled areas and table calcs, I look for 7.0.1 to address some of these issues. If they are not fixed, I do have workarounds as well.
connectmedian_jm_edit.twbx 18.7 KB