3 Replies Latest reply on Nov 27, 2015 3:05 AM by Jonathan Drummey

    Complicated Indexing

    David Pires

      Hi everyone

       

      I’m in a bit of a pickle. To give you a bit of background, I have been asked to design dashboards that will be used across a number of databases as such I need to avoid the use of parameters(more on that later) as they are not dynamic and I can’t be amending them every time I switch databases. 

       

      The outcome:

       

      I need to be able to calculate an index on other forms of transport vs car. Key to this is that I need to be able to select what method of transport I want to index at a time.

      1. i.e. train vs car or cycle vs car.

       

      Using parameters this is easy peasy, however as I've mentioned before I will need to switch databases and there maybe we have other elements like, airplane vs car or skate vs car. By the way I've mocked up this data, I don’t know anything about transport.

       

      So I started by creating two calculated fields

       

      !Car IF [Transport] = "Car" then [Value] END

       

      !Other Transport IF [Transport] = "Car" then NULL else [Value] END

       

      After I created another two calculated fields to give me a percent distribution

       

      !Car as a % SUM([!Car]) / total(sum([!Car]))

       

      !Other Transport as a % sum([!Other Transport]) / total(sum([!Other Transport]))

       

      Important to note here, I can’t use LOD’s as I want the ability to give my end user filters for them to use.

       

      Finally I calculated the Index

       

      !Index Other on Car [!Other Transport as a %] / [!Car as %] * 100

       

      Now the point is that, I want to give my end users the ability to decide what they are indexing against car and show only that. Car is always the indexer and the indexee is decided by the end user. Remember we cannot use parameters as they might change across databases and adding them all is not an option.

       

      I’ve attached the workbook below

       

      Thanks in advance

       

      David

        • 1. Re: Complicated Indexing
          Vasil Petkov

          Hi David,

           

          Check out the attached solution. Let me know if that works for you.

           

          -V

          • 2. Re: Complicated Indexing
            David Pires

            Hi Vasil

             

            Thanks, unfortunately blending is not an option due to performance. Internally we decided to change the brief and will do something else. I'll put this one to the side for now.

             

            Thanks for taking the time to look into it.

             

            David

            • 3. Re: Complicated Indexing
              Jonathan Drummey

              Hi David,

               

              I found this thread from your link to it at Re: We don't need "Dynamic" Parameters. I know you've gone on to something else, and I like to solve challenges so I'm taking a swing at this.


              If I understand the problem correctly I can see why you were stuck, the approach you had taken is trying to do a form of cell- or variable-based set of operations in Tableau and that can run into problems when the "cells" we want to work with are crossing different rows.

               

              The "crossing different rows"  part is because Tableau is a database-backed application where we aggregate down the rows so doing inter-record comparisons is more difficult than a cell-based application like Excel where comparing across rows or columns is effectively the same operation. In the posted workbook you've worked around this by using a record-level IF statement to get the Values for Car and the selected transportation, then table calculations to work across the marks, that's the most common way. Data blends as @Vasil used, Level of Detail (LOD) expressions, Sets (if you just need a boolean result), and parameter-based calculated fields are the other available techniques in Tableau.

               

              Back to the earlier part of my sentence...the reason why I call this "cell- or variable-based operation" is that Excel and basic programming teach us to work with one thing at a time. In this case the challenge as worded is to "take a given transportation Value and compare it to the Value for Car", and the approach is:

               

              1) select a specific Transportation

              2) get the Value for that transportation

              3) get the Value for Car

              4) compare the selected Value to the Value for Car

               

              And this runs into problems right at #1 with how to select the Transportation because we need a list of values. Parameters don't meet the requirement to have a self-refreshing list, and in a single source solution a filter on Transportation would remove the Car value that we need to finish the computation.

               

              However since Tableau is a database-backed application it doesn't use cell- or variable-based operations. It uses set-based operations where we can work on a whole batch of records at once and we can flip the order of operations around:

               

              1) make the Value for Car available to each Transportation

              2) get the Value for *each* Transportation

              3) compare the Value for each Transportation to the Value for Car

              4) filter for a specific Transportation

               

              In the attached workbook I set up two versions of this using two of the methods I described above, specifically table calculations and LOD expressions.

               

              a) Index TC jtd uses a table calculation SUM([Value])/ WINDOW_MAX(SUM([!Car])) * 100 as the index. The WINDOW_MAX() gets the value of Car available to each Transportation and with Transportation as a dimension in the view the measure is computed for each Transportation. Then another Transport TC table calc with the formula LOOKUP(MIN([Transport]),0) is used to filter the view *after* other computations have been completed.

               

              b) You had stated "Important to note here, I can’t use LOD’s as I want the ability to give my end user filters for them to use." I'm not understanding this sentence because one of the things FIXED LODs enable is for us to have aggregations that don't change results based on user filtering. In other words, the users can filter what we want them to filter on while we can have feeder values that aren't affected. In this case the Index LOD jtd calc uses the formula  SUM([Value])/ MIN({FIXED : SUM([!Car])}) * 100. The FIXED LOD expression returns the Value of Car to each Transportation, with Transportation as a dimension in the view the measure is computed for each Transportation. Then the Transportation can be filtered using a regular dimension filter that is applied *after* the FIXED LOD is computed. If you need filters to apply to some values for Car then you can make those context filters.

               

              Whether I'd use a) or b) in practice depends on the situation, mostly it would come down to whichever meets performance needs.

               

              Let me know if you have any questions,

               

              Jonathan