1 Reply Latest reply on Feb 17, 2018 5:06 AM by Yuriy Fal

    Single datasource  - multiple sets

    Steve Ware

      I'm using a dataset that looks like:

       

      1. Vendor
      2. StockNumber
      3. Order ID
      4. Sales

       

      I've been working on a viz that has on one axis all products from the selected vendor (quick filter) and on the other axis all products that occur on orders containing the items on the first axis, but not including that item. The measure is sales of the items on the second axis.

       

      I have been able to accomplish this by using a duplicate of the data source with an equality join on Order ID and an inequality join on stocknumber. I can get what I want this way but joining the duplicate datasets is pretty slow with larger vendors selected.

       

      I think there has to be a way to do it with a single dataset, but I haven't happened on how.  I have played around with sets but I don't use them often and am not really competent with them. Is the duplicate joined data sets the best way to accomplish this or can somebody point me in another direction with a single data set approach?

       

      Should look something like this (w\o the redactions)

       

      boom.jpg

        • 1. Re: Single datasource  - multiple sets
          Yuriy Fal

          Hi Steve,

           

          A Cross-Join is a common approach for a Market Basket analysis.

          There would be others as well -- depending on a task (and data).

          For example, if one of the products could be used as an 'anchor',

          then other methods would be preferred (from the performance / simplicity)

           

          As an example please refer to the recent thread on a related subject:

          Emulating a Subquery

           

          So if a cross-join approach is easy to comprehend,

          the (initially decent) performance could be further improved.

          Here are some ways (and it's better to be on version 10.5 for that):

           

          If the cross-join are performed on a Live Connection --

          using a general DBMS or Legacy Excel/Text (MS Jet Engine)

          simply because of the Non-equi Join ON [stocknumber] --

          then bringing a HyPer Engine to the mix (hence 10.5) would help.

           

          There could be two main routes with the HyPer:

           

          1) Export the original table / view (sans Cross-Join) to CSV file(s),

          then bring (them if many) as a datasource to Tableau Desktop 10.5.

          Use a Live Connection -- Tableau would be creating an extract in background.

          The Non-equi Joins are available in HyPer (this is the point).

          And Tableau is smart enough to not create a Cross-Joined extract,

          it simply would be joining the same-table extract ON itself (all in the background).

           

          2) If the performance in above wouldn't be sufficient enough,

          then explicitly creating a single-table extract (in the first initial datasource)

          and then self-join the extract file (in the second 'final' DS) would help.

           

          The subsequent optimisations could be applied via Filters / Actions on a dash.

           

          Yours,

          Yuri