0 Replies Latest reply on Dec 16, 2010 1:49 AM by Richard Leeke

    Seeking a more efficient way to lookup values

    Richard Leeke

      I'm not really sure if this belongs here, "Calculations", "Maps & Links" or "Wouldn't it be Nice If?".


      A few times recently I've hit examples of a situation where my main data source is a data extract and I need to lookup values from another source - but for one reason or another a simple data blending relationship does not do what I need.  Whilst I have been able to achieve what I want to do functionally using some combination of data blending and table calcs, it's a bit on the slow side.  Or a lot on the slow side if I have a lot of data.


      Most of these examples have been things I would have done with RAWSQL in the past.


      I'll explain the latest case and see if anyone can suggest any alternative ways of coming at the problem.  I'll also post a "Wouldn't it be Nice If" with an idea about a feature that would help.


      I have an extract with 100,000 rows of data (carefully trimmed so it meets the limits for Tableau Public).  The data includes lat/long coordinates.


      I have a separate table of reference locations, also with lat/long (in fact these are PostCodes).


      I want to be able to search my main data source to find all rows located within a selected distance of a chosen reference location.


      Currently I have two parameters defined: Distance and PostCode.  So I need to look up the lat/long coordinates of the selected PostCode and then use them in a calculated field for each row in the main table to work out the distance from the selected location.  Finally a filter selects the rows within the chosen distance.


      I do the lookup with data blending (by defining a calculated field in the primary data source which is just the value of the PostCode parameter) and then looking up the matching row by joining on that to the secondary data source with the post code locations.  A calculated field then gives the distance.


      But as the post code coordinates are retrieved by data blending, that means that the calculated distance field has to be evaluated in Tableau after retrieving all rows from the primary data source, rather than down in the data engine.  Finally the result set is filtered in Tableau.


      With 10,000 rows in the extract it only takes a few seconds.  With 100,000 rows it's over a minute (and too slow to be a viable approach for something posted to Public, I suspect).  And on a really big data source it probably wouldn't even fit in memory.


      The only work-around I've been able to come up with to make it fast enough is to have an initial selection step where the PostCode is entered as a parameter and returns the lat/long coordinates.  I then manually copy those coordinates to separate lat/lon parameters and use them in the distance calculation.  As the parameter values are known when the view is then refreshed the filter condition can be evaluated in the data engine - so the view refreshes instantly even on big data.


      It would be really nice to have a way of passing values to parameters, in much the same way as filter actions pass filter criteria.  That would solve this (and most or all of my other recent examples).



      Any ideas for a different way to come at the problem until version 7 comes out?  ;-)