7 Replies Latest reply on Oct 5, 2012 10:30 AM by Jonathan Drummey

    Lookup using two data sources

    Cole Pacak

      Hi,

       

      I'm trying to figure out how to do the equivalent of an Excel Index-Match or VLookup calculated field in Tableau.

       

      I have two tables: one with test results and one that corresponds percentiles to test scores.

       

      Ultimately, I'd like to display an average score for each subject along with the corresponding percentile, where the average score is used as the lookup value in the 2nd table so that the percentile is returned.

       

      Subject
      Average Score
      Corresponding Percentile
      Math44?
      Reading64?

       

      The workbook and Excel file with the two tables are attached.

       

      Thanks!

       

      Cole

        • 1. Re: Lookup using two data sources
          Jonathan Drummey

          Hi Cole,

           

          There's something a bit strange in your data - the scores in the "score table" range from ~4 to 91, while the scores in the "percentile table" go from 151 to 250 (Reading) and 121 to 220 (Math). So there doesn't appear to be any way to crosswalk from one to the other.

           

          Also, right now the percentiles exactly match up with the 100 point range of each score, so in that case you could get a percentile simply by subtracting 150 (Reading) or 120 (Math) from the score. Assuming that the percentiles are actually different, the only way I can think of to get the view you want in Tableau is to use Custom SQL where you would calculate the aggregation in a subquery and then join on the integer or rounded result to the percentile table. The difficulty here is that you want to use an aggregated measure as the joining field to the other table, and things like data blends require non-aggregated values, and RAWSQL throws errors.

           

          Jonathan

          • 2. Re: Lookup using two data sources
            Cole Pacak

            Sorry, Jonathan. I did a lousy job creating a mock file! I changed the values in the Score table so that they would correspond with the scores field in the Percentile table.

             

            Do you have any thoughts on how I could join these tables? Do I really have to use Custom SQL? This kind of lookup is so simple in Excel, so I'm having a hard time understanding why it's so complex in Tableau.

             

            Thanks for your help!

            • 3. Re: Lookup using two data sources
              Jonathan Drummey

              Hi Cole,

               

              The best metaphor I've come up with is that Excel is like an empty canvas that you "paint" your numbers onto, where those numbers can be most anything and plucked from anywhere. Note that the upper left bit of the default Excel UI is dominated by formatting options. Tableau has a different approach, where the UI starts from the data, and uses attributes of the data to define formatting options like size, color, etc. This makes a lot of possible-but-time-consuming-and-difficult things in Excel incredibly fast and easy in Tableau, and conversely makes a few things in Excel much more difficult in Tableau.

               

              Excel doesn't care about aggregate or non-aggregate values, discrete or continuous, or dimensions or measures. Excel doesn't really know anything about data types, instead it lets you format the cell. In Excel, you build the chart by choosing the chart type (more formatting). By knowing about your data, Tableau has built in rules based on best practices to choose the right view for your chosen dimensions and measures, and Tableau can optimize for performance.

               

              The situation you are running into is that you want to join the (rounded) results of an aggregated calculation - the average score - to get the appropriate percentile from another table, and that's something that Tableau doesn't support in any way that I know of (and I'd be happy to find out that Tableau does support this).

               

              I hadn't mentioned this first because it's not a very dynamic option - besides Custom SQL, you could always create a long CASE statement to replace the table:

               

              CASE ATTR([Subject])

                  WHEN "Reading" THEN

                      CASE [Rounded Avg Score]

                          WHEN 151 THEN 1

                           … (other values)

                          WHEN 184 THEN 34

                          WHEN 185 THEN 35

                      END

                  WHEN "Math" THEN

                      CASE [Rounded Avg Score]

                          WHEN 121 THEN 1

                           …. (other values)

                          WHEN 184 THEN 64

                      END

              END

               

              I've attached a workbook with the start of a CASE statement, and two Custom SQL options. In one, Custom SQL is used to generate the Avg Score, and then that can get blended with the results from the percentile table. In the other, the Custom SQL both generates the Avg Score and joins to the percentile table.

               

              Does that help explain things and give you some workarounds?

               

              Jonathan

              • 4. Re: Lookup using two data sources
                Cole Pacak

                Hi Jonathan,

                 

                Unfortunately the CASE solution wouldn't work because the percentile table I posted is a drastically watered down version of the actual percentile table I need to use. But I appreciate the idea.

                 

                However, the two Custom SQL options are very intriguing! Could you provide more information on how you went about that OR at least point me in the right direction of where I could learn more about the two approaches?

                 

                Thanks,

                 

                Cole

                • 5. Re: Lookup using two data sources
                  Jonathan Drummey

                  Hi Cole,

                   

                  You ask a really good question, and I'm having a hard time answering it, mostly because I learned SQL (and techniques for manipulating data sets) on the job a long time ago, and I haven't had to teach it and therefore come up with ways to explain all the how's and why's. You'd be best served by doing a web search for "learning SQL" and seeing what comes up (there are a number of online tutorials), or picking up a book or two.

                   

                  Here's a couple of paragraphs to help you get started:

                   

                  In Excel-land, there are worksheets that are in a tabular format, but there's not much structure beyond that. There can be arbitrary ranges, references from one cell to another (or on a totally different worksheet), etc. Information is often formatted for a human to read, which is different from what's easy for a computer to read. Data in relational data sources (i.e. data that's easy for computers to read and manipulate) has columns (fields) and rows (sets of fields) in a table. A column in one table can relate to a column in another table in a variety of ways - inner join, left join, right join, cartesian join, and union are the five most common. A VLOOKUP in Excel is much like a form of left join, where if the VLOOKUP can get data from the other source it can, otherwise it returns nothing. The problem you were having was that the data you wanted to use to relate (the average score) doesn't actually exist in a column that can be easily related, since it's a calculated aggregate measure.

                   

                  Therefore, the need was to generate the calculation for the average score and turn that into a column that can be related, and then connect that result to the score column on the percentile table. In Excel, you'd be doing much the same process: You'd designate a cell to have the results of the average calculation, and then have another cell to do the VLOOKUP or Index Match on the percentile table, and then maybe do a fill->down or something similar to get the results for the different subjects.

                   

                  In the case of the first option, I set up the Custom SQL was to perform the aggregation to create the average score inside the query that defines the Tableau data source. That way, when the average score arrives in Tableau, it's already aggregated and Tableau can treat that as a column that can relate to other data. Then I used Tableau's data blending feature to link to the percentile values through the score and subject. Data blending is special Tableau variation on a left join that can connect data across disparate data sources. You can get more information about data blending in the Tutorials section of the Tableau website: http://www.tableausoftware.com/learn/training.

                   

                  In the second option, I did the same query to perform the aggregation, and used that as a subquery within the larger query that joined the average score result and Subject to the percentile table. This all happens within the query, so the data source has all the necessary data.

                   

                  Hopefully that all makes sense!

                   

                  To sum up, Tableau is a fantastic tool for displaying and visualizing data, however it does require the data to be organized in a suitable manner. I believe that learning the basics of database structure and queries is necessary to get the most out of Tableau, and it improves productivity and capability to handle more varieties of data.

                   

                  Cheers,

                   

                  Jonathan

                  • 6. Re: Lookup using two data sources
                    Cole Pacak

                    Hi Jonathan,

                     

                    Thanks for taking so much time to explain all that. It's a good push to continue the quest to learn SQL!

                     

                    FYI - I figured out a workaround by creating the table of data I needed outside of Tableau and bringing it in as a secondary data source. So I was glad to buy myself some time as I look into the Custom SQL joins that you recommended.

                     

                    Thanks again!

                     

                    Cole

                    • 7. Re: Lookup using two data sources
                      Jonathan Drummey

                      You're welcome!