3 Replies Latest reply on Mar 4, 2014 12:29 PM by Richard Leeke

    how to filter before joining?

    Zach Leber

      I have a single data source which is an expensive 5-way table join of one million records that takes several minutes. I'm then searching those one million records for just a few records using a custom value list filter.  What I'd really like to do is filter one of the tables before joining it with the rest so I don't waste time joining records I'm going to throw away. Attached is an example packaged workbook where I have two tables in Excel.

       

      Orders = Order ID and Store ID

      Stores = Store ID, city, and state

      I want to search for just a few orders and see the total order counts by store location.

       

      I'm using Tableau 7. I know Tableau 8 can pass parameters into Custom SQL but I don't think parameters can easily contain lists of user-pasted Order IDs. Context filters don't help because the DB still has to do the full join before creating the TEMP table with just the rows I want. I considered blending as well but couldn't see a way to make that work either. Any advice appreciated, thanks.

      filter join.png

        • 1. Re: how to filter before joining?
          Jonathan Drummey

          Hi Zach,

           

          Starting out, a couple of basic questions to set the stage:

          - What's your data source? Actually, this question is more whether you are using a Multiple Tables connection with the proper foreign-keys so Tableau can do join culling? (I think there was some join culling in v7???)

          - What's the cardinality of the Order IDs in your data? Are you using a Show Relevant Values Quick filter? Those can destroy performance.

          - What's the cardinality of the Stores in your data? Is it in the tens, hundreds, thousands, more?

           

          I don't have an exact solution in mind, my general approach would be to separate the data and possibly the flow into smaller more manageable chunks, so we can get the effect you want of filtering before the join. One approach would be data blending, where you have one data source that is the Order & Store, and the second dimensional data source with the store info. Then the user could filter the (much smaller) Orders data set and blend in the store info. Another approach would be to have the separate data sets and a Filter Action on the Order dimension that applies to the store info data source. Alternatively, if the cardinality of the Stores equivalent is small enough to be manageable and you're willing to live with hard-coding values, you could get away with creating a calculated field with [Store ID] for the formula and then assigning Tableau aliases to it.

           

          Jonathan

          1 of 1 people found this helpful
          • 2. Re: how to filter before joining?
            Zach Leber

            Thanks for engaging this question Jonathan. My data source is 8 tables in Oracle, cross-joined in a way that makes the Multiple Tables feature in Tableau almost impossible to use (I spent 3 hours last night trying in both T7 and T8). After the join there's one row per order, with about 10 extra columns from the joined tables. I only use CVL quick filters so am never querying for a list of all values. In my actual case we have 1 million samples stored in about 10,000 different containers. I'm creating a container list given a user-provided sample list. I do use a lookup table and blending in another app and it works great but don't think it would here. After all my debugging I realize that my real problem is simply that I'm unable to take advantage of the index on the sample ID so even finding the samples in a lookup table takes forever. If you have any insight on that problem I would really appreciate it. Link is here:

            how to leverage table indexes when quick filtering on a calculated field based on the indexed column?

            • 3. Re: how to filter before joining?
              Richard Leeke

              This comment in your original question caught my eye:

               

              'I don't waste time joining records I'm going to throw away'

               

              That shouldn't happen. Depending how your query is constructed the optimiser ought to be able to apply the filter early in the process before joining all the rows from other tables. Looks like you may have realised that that isn't the real issue, as per the other question about the prefixes killing the index use.

               

              BTW - there are tricks for achieving cross-joins with a multiple table connection, so you don't necessarily have to resort to custom SQL - which does often result in less efficient queries.

              1 of 1 people found this helpful