Improving performance by avoiding drawing multiple marks at the same location
Richard Leeke May 2, 2010 2:23 AMI mentioned a technique I use to speed up the rendering of views with large numbers of marks in this thread: http://www.tableausoftware.com/forum/map-issue-lat/lon and promised to explain. Here's a bit more detail about how that works - posted as a new thread to avoid hijacking the original any further.
The basic principle is really simple, though the detail can be quite tricky.
If you have a viz that has very large numbers of marks (by which I mean hundreds of thousands or millions), it is very likely that lots of those marks are going to be drawn on top of each other. So if you can avoid drawing marks that won't be visible anyway, you can speed things up.
You do this by redefining the view such that the database returns less marks at each position. The attached images show the original and optimised versions of one of my views. The original has 1.5 million marks and takes over 2 minutes to refresh; the optimised version has 50,000 marks and takes around 7 seconds. Comparing the images with an image comparison tool there are lots of differences - but by eye they're good enough for me.
There are some limitations and restrictions with doing this - most notably that zooming the display doesn't work very well - and the workarounds I've managed to come up with for that are a bit clunky. But if rendering speed is the main constraint on your analysis, the limitations may be worth it.
So how do you decide which marks are going to be "on top of one another", which ones can you safely suppress without affecting the viz, and how do you do it? For now I'll just outline the answers to each of these - I can fill in more detail if anyone is still interested (Michael!).
Which marks are on top of one another?
If the x and y coordinates for 2 marks are within the bounds of the same pixel, I treat them as being at the same place and so as candidates for suppression. Now of course I don't know the inner workings of the Tableau rendering engine, so I may be truncating when Tableau is rounding, or whatever, but this approach seems to give a good enough result.
Which marks can you safely suppress?
Just because 2 marks are at the same pixel location doesn't mean that they are not both visible. A big mark can peek out from underneath a smaller mark at the same position. A solid circle is visible through the hole in the middle of a hollow circle. etc.
So in general, it's only safe to suppress multiple marks of the same size and shape at the same position.
Even this approach doesn't properly account for transparency, I don't think. I haven't really looked at this in any detail, but I'm pretty sure that multiple identical semi-transparent marks at the same place give a darker result - I think that's the whole point. So if you're using transparency like that you're probably out of luck with this whole approach.
Depending on the structure of the view, just returning a single mark of any given shape/size/colour at any one pixel location may be enough to make a big difference to speed, or may make very little difference. If there are only a few distinct permutations of shape/size/colour it may be enough just to return one of each and leave it to the rendering engine to sort out what's visible. But with lots of mark types, that may not make much difference.
For example, the attached images have about 150 different values on the colour dimension, so just reducing down to one of each at any position only reduced it from 1.5 million to 500,000 marks. So in that case I chose to use just a single size and shape, and take control for myself of the order of display of the different colour series, by putting the colour display order in a table in the database and just returning the top mark of any type at each location. That brought it down from 500,000 to 50,000 marks.
How do you do it?
Basically you need to truncate or round the x and y coordinate values to the nearest pixel. In some cases even just doing one dimension may be enough.
The steps to do this are are as follows.
1) Measure/estimate/guess the pixel dimensions of the view. The most accurate way I've found of doing this is to take a screen print of the viz, copy it in to Paintbrush, crop it to the dimensions of the drawing surface and then look at the image properties in Paintbrush. But actually you don't need to be very precise - just knowing your display resolution and guessing the proportions of the drawing surface to the nearest hundred pixels works just fine. I measured 1250 x 820 for the attached images, but 1200 x 800 looks just about as good. Really it would be nice if Tableau exposed the dimensions in some way - but I don't think that's as easy as it sounds.
2) Round or truncate the continuous x and y dimensions to the nearest pixel.
There are various ways of doing this, depending on your data source. I sometimes do it with calculated fields, sometimes RAWSQL, sometime in a custom SQL connection and I've also done it in a stored procedure - though there are some gotchas with that approach.
Here's the formula for a calculated field for a continuous floating point dimension on the x-axis:
[xPixels] = width of the view in pixels
[xMin] = minimum value of the x-dimension (columns shelf)
[xMax] = maximum value of the x-dimension (columns shelf)
[PixelRatio] = number of x units per pixel
[xRaw] = original x-dimension
[xTrunc] = truncated x-dimension
[PixelRatio] = ([xMax]-[xMin]) / [xPixels]
[xTrunc] = INT( [xRaw] / [PixelRatio] ) * [PixelRatio]
Strictly this should use floor() rather than int(), since int() truncates towards zero, but Tableau doesn't have a floor() function. You can do it by taking account of +ve and -ve values of [xRaw] in the calculated field, but I don't think it makes much difference - just a 1 pixel wide distortion at x = 0.
Depending on the viz, there may be simpler approximations. I quite often have views with a continuous datetime axis with millisecond accuracy, showing data for a 24 hour period. There are (24 * 60) = 1440 minutes in 24 hours, which is more than the pixel width of the view on my display. So replacing my original millisecond resolution datetime with a continuous calculated field defined as DATETRUNC('minute', [datetime]) is all that's needed to speed up a scatter chart by an order of magnitude. I often don't even bother to do anything with the y-axis.
Depending on the mark size, it can even work fine to normalise it down to a much coarser-grained grid - 10 x 10 pixels say. The bigger the marks the less point there is in retaining so many marks.
3) Use the truncated x and y dimensions on the columns and rows shelves, instead of the original dimensions.
That's all that's needed to return one mark of each type at each location. Returning just the "top" mark type needs a bit of tricky SQL, which I'll leave as an exercise for the reader.
And that's all there is to it. ;-)
Actually there's some more complexity around determining your minimum and maximum dimensions, and as I mentioned, zooming the view gets very clunky (and really needs a few of the extra Tableau features I've posted about on the "Wouldn't it be nice if" forum over the years). But I think that should be enough to get you going or more likely put you off.
This originally came up in the context of a posting about maps, and I get the impression that was also what your follow-up question was about Michael. Of course there's an extra complication with mapping which is that the map projection means longitude doesn't map uniformly to pixels. The nearer you get to the poles, the more pixels per degree of longitude - I think. I haven't really thought very hard about this, and I certainly haven't tried it, but I think just using a value for the pixel ratio worked out at about the Arctic Circle, or maybe a bit further South, will probably work just fine - it will just mean that there's a bit more overlap than necessary down near the Equator. I doubt that you have to go to the trouble of calculating the ratio accurately as a function of longitude.
Good luck Michael. You did ask!
-
Response Scatter_0.png 325.6 KB
-
Response Scatter Optimised_0.png 277.1 KB