Thanks Jonathan! I'm tracking set issue with Actian support. Please recommend me something on 2nd use case when you have time. Take care of your kid!
Hi Nikhil, I've started taking a look at this, I'm trying to come up a solution that has a reasonable chance of working within the constraints you've listed so far. I do think contacting Tableau professional services or another consultancy (I know Interworks has done projects with large data sets) would be useful, there are often optimizations that can be found by someone who is sitting with you that aren't obvious when we're just passing workbooks and forum posts back and forth.
For example, one question is about those 250K retailers. Are you proposing that the user have two filters with 250K possible values each, one to do the initial selection and another to do the second selection? Depending on how that's done there could be performance issues in just building a filter with so many possible values, and I don't know your performance constraints.
Another question is about your data source. Since Tableau is running live queries against the data source, Tableau can never be faster than the data source. For simple queries, are you getting the response time that you need? Because if those aren't fast enough, then more complex operations using Tableau features such as Sets certainly won't work for you at the start, and even if you tune the data source may not be fast enough and you'd have to use another approach such as a stored procedure or materialized view.
In any case, I've got some ideas I'm trying out, at this point it'll be another day or two before I have something written up. You're the third person in that last couple of weeks to ask about this, it's definitely not an out-of-the-box click&drag feature in Tableau.
I've attached a workbook with four options however as I'd noted in my post on Friday Re: Facing challenges implementing certain use case given your constraints I'm not sure any of them will be workable for you. This kind of multi-pass aggregation or complex sub-query can sometimes work under specific conditions when we can use Tableau's features like Sets and/or Filter Actions, and/or take advantage of Tableau's order of operations in terms of what filtering is done when. Besides the volume of data needing to use a live connection, this situation is made more complicated in that you want to work with multiple levels of detail - the filtering is happening at the cust_id level for the first selected retailer, then the data is displayed with the month level of detail, and then the 2nd retailer is again selected at the cust_id level and displayed with the month.
One thing I noticed in your workbook was that the first worksheet "Repeated Users at McD" is not returning accurate results. I think you were presuming that the Repeated Users Set conditional filter for COUNT(trans_id) > 1 and the Retailer==McDonald's filter were acting at the same time. Conditional filters in Tableau are calculated as a separate subquery at the level of detail of the dimension of the filter. For example a conditional filter on the customerID dimension using COUNT(transactionID) > 1 will return all customerIDs with more than 1 transaction across the entire data source. Regular quick filters, such as one to only get customers in the past 3 months, are not applied to conditional filters. To make a conditional filter responsive to another filter you have 3 options:
- Use a data source filter. This filter will apply to the entire data source and is added to the WHERE clause of every query.
- Use a context filter. This can be applied only on specific worksheets, and works by creating a temp table in the data source (or in Tableau if that's possible) that is then used for all other queries.
- Embed the other filter in a row-level calculation inside the conditional filter. The conditional filter is calculated as a separate subquery which is inner joined to the main query(ies) for the view.
In your original Transactions at McD worksheet, the regular quick filter for McDonald's is operating *after* the Set conditional filter, and returning incorrect results.
I created a "Repeated Users from Set" worksheet that shows correct results, where "filter" for the retailer is actually a parameter. We can't feed quick filter selections on other dimensions into a conditional filter, and we don't (yet) have dynamic parameters (see the feature request at Dynamic parameters | Tableau Support Community), so this a static list of retailers. I've don't know if you would need a parameter with 250K options, I've never created one and I don't know if it would work. (That's part of why I wrote what I did on Friday).
The four options I came up with are:
1a) Set. This uses the Repeated Users from Set worksheet, and uses the same Set filter in another worksheet for the 2nd retailer, which uses a regular quick filter for that retailer:
If Actian doesn't support Sets, then this won't work. However, the next option might.
1b) Conditional Filter. One way to think of Sets is that they are "pre-configured" filters. Sets can definitely do more than that, so that's a limited metaphor. In any case, we can get the same results as 1a) using the same Conditional filter as the Set on both worksheets. If you can deal with having the parameter for selecting the first worksheet, this might work for you.
1c) Data Source Filter. This is another variation on the first two, however instead of using a Set or a Conditional filter on the Filters Shelf, I've added a data source filter using the same condition. The Data Source Filter is added to the WHERE clause of every query.
2) Filter Action. I set this up because this most directly matches what you had described for requirements, however given your data volumes I doubt it will work, plus the interaction is non-intuitive. There are several problems with this view:
- The first is that to draw the initial view, we have to have every cust_id in the view to be available for the filter action. This makes for thick lines because Tableau does overprinting when there are multiple marks in the same place.
- In order to make sure that all the cust_id's are passed to the next worksheet, we have to pad out the data because not every cust_id has transactions every month. (This is a complicating factor of wanting to filter at one level and display at another). This means the view has to display a whole lot more marks, and that process gets slow. I limited the display to the top 1K customers, if you turn off that Top Filter prepare to wait (or even run out of memory in 32 bit Tableau).
- For the interaction, the user has to lasso (not click) to select all the marks (cust_id's) at that location, then choose the Filter Action. If they only clicked on the location, they'd get the top-most mark (cust_id) and then the lower worksheet would be incorrect. This is a known issue in Tableau.
- The Filter Action itself adds all the selected values to the WHERE clause of the query for the 2nd worksheet. Given your volumes, in this view there could be 10s of thousands of cust_id's or more. This leads to a massive query that can take some time just to move from Tableau to the datasource. This is why I came up with the conditional filter/Set solution in the first place.
Not knowing the desired response times or interaction flow, another possibility is this technique for doing multi-pass aggregations created by Ty Alevizos:
There are also a couple of other Ideas out there that are related to multi-pass processing in Tableau:
- multipass SQL - http://community.tableau.com/ideas/2958
- worksheet as a data source - http://community.tableau.com/ideas/1604
I hope this helps, and please, let us know what you end up using!
[edited to include workbook & screenshot]
complex subquery share of wallet.twbx 1,002.6 KB