7 Replies Latest reply on Feb 22, 2013 1:20 PM by Nayana Wagle

    Running a select distinct from extract after applying filters

    Nayana Wagle


      Is there a way of running a "select distinct" statement on an extract depending on the filters attached to a graph?


      Detailed example:

      I have a portfolio of assets and we are running some rules on them to check for appropriate actions to be taken on them.


      The results_table has the following columns -

      1. Unique identifier of asset

      2. Rule id run on the asset

      3. Result of running the rule (boolean)


      Now each rule can be aggregated at various levels, such as

      1. multiple rules to one element

      2. multiple elements to Line of business

      3. multiple lines of businesses to business category

      This info is stored in another table, named rule_info.


      Also each asset has some other characteristics attached to it such as value, etc. This info is stored in assets_info


      What I would like to do is run a command in Tableau that allows me to run an equivalent of this SQL query


      Select distinct "unique identifier of asset"

      from results_table

      join rule_info

      join assets_info

      group by "line of business" - (this phrase here could be "Line of business", "element", "business category",etc. which depend on the filter that is attached to the graph)


      Then I want to take this list, go into assets_info and get sums of various characteristics of the assets that fall in this list.


      Is there any way of getting that distinct list in Tableau?



        • 1. Re: Running a select distinct from extract after applying filters
          Jonathan Drummey



          If you have a given dimension like an Asset ID and you drag that onto the Rows Shelf, Tableau will give you a distinct list of Asset ID's and then you can do whatever you need to with the data from assets_info.


          One thing you will need to be cognizant of is what your level of detail in the data is, what the level of detail in the view is, and what aggregations you are using. Without knowing more about your data such as whether assets_info and rule_info are one to one or one to many on the results table, it's hard to give more specifics. A case where this comes into play is in situations when there is something like a "sale amount" that is applied at the level of an order, but the data is coming in at an "order item" level so when showing total sales for each order SUM(sale amount) would be too large, and instead an aggregation like AVG, MIN, MAX, or ATTR needs to be used instead, and to get a sum of sales across orders then a table calculation would be required.



          • 2. Re: Running a select distinct from extract after applying filters
            Nayana Wagle

            Hi Jonathan,

            I'm not completely following what you said. So I'm attaching a mock-up of the results extract - this is the extract after joining the results table, asset_info and rule_info. I have kept it simple and the only measure I want in the graph, is sum(asset_value).


            Now I want the rows to be replaceable with

            1. Business_category

            2. Element

            3. Rule



            But the sum of asset value of distinct assets should always be the same - namely 10K + 25K + 42K.


            As you can see, the sum here is a direct sum of all the values which fulfill the filter and not a sum values of distinct assets that fulfill the filter.


            Does that help explain my question?

            • 3. Re: Running a select distinct from extract after applying filters
              Nayana Wagle

              Sorry - should have attached packaged workbook. Here it is.

              • 4. Re: Running a select distinct from extract after applying filters
                Jonathan Drummey

                In the view you set up, you can just use ATTR(), MIN(), AVG(), or MAX() as your aggregation and it will provide the desired results because you have asset_id on Rows. Though it is slower than other aggregations, I prefer ATTR() in cases like this because if for some reason there are multiple different values returned for a given asset_id then the measure will display *.


                If you don't want the view broken down by asset_id and just want a sum of each distinct asset_id's asset_values per business category, then you'd need to use a table calculation, I set that up in the attached, let me know if you need an explanation of how that works.

                • 5. Re: Running a select distinct from extract after applying filters
                  Nayana Wagle

                  Hi Jonathan,

                  Continuing this old topic, I need some more advice. The explanation you gave works quite well as long as I want to see the asset value in the graph. However, I have multiple graphs built on top of the asset value. How would I tackle those?


                  For example, I have a pie chart that shows what is the % contribution of each business category to the overall asset values where the result failed. How would I handle that?


                  A second value  I have (Quality), is the calculation of the asset values where the result is fail compared to the total asset values of the organization. I have to create trend charts for Quality. But I am not able to understand how I would define Quality in the calculated fields.


                  Please let me know. Thanks!

                  • 6. Re: Running a select distinct from extract after applying filters
                    Jonathan Drummey



                    I'm not totally clear on what you mean, and the data set that you posted in your initial workbook only has one business category so it's not possible to show you a % of total calc in your data. Here are a couple of pointers, though:


                    The basic % of total calc you get from a quick table calc is SUM([measure])/TOTAL(SUM([measure])).


                    For the Quality measure, since the Result is a dimension you can create a new measure that is SUM(IF [Result]=="Fail" THEN [Asset_value] END), then you could wrap that in a % of Total like SUM([fail asset value])/TOTAL(SUM([Asset_value])).


                    If you'd like more help, I suggest you post a packaged workbook with some sample data, and maybe a mockup of the kind of charts you are looking for.



                    • 7. Re: Running a select distinct from extract after applying filters
                      Nayana Wagle

                      Hi Jonathan,

                      I am working with some confidential client data - so it is rather difficult to put in the workbook. So please bear with me as I try to explain the issue.


                      The client had created a workbook where the input data (as seen in the second tab of the Excel sheet) was rule based. They ran these rules on their asset portfolio and only captured the results of the rules. For example, if two assets failed, there was no way to know which two assets. But the aggregate asset value would be stored for assets that failed a rule. So if an asset failed three tests, its asset value would be counted as defective in three places. There are about 9 dashboard with multiple graphs that were based on this type of input data.


                      Now, they have a new result structure to which they want to migrate the entire workbook. Please refer the first sheet of the data. The main difference here is that the results are stored at a asset_id + rule_id level so that the particular asset can be tracked. Now the added condition is this - even if an asset fails more than one rule, unlike the previous version, they do not want to count the asset as defective multiple times. So no matter how many rules it fails, it is to be counted only once.


                      To give you an example of the graphs in question, one of the graphs is a pie chart where we can see what business category has what % of the total defective portfolio. The original formula is as you said above,



                      But in the new format, I need to calculate the asset value failed by checking the result (whether pass or fail), taking a distinct on asset id and then calculating the asset value failed as the sum of amounts of these distinct asset ids.


                      Referring the excel, the old % for BPRNA was (17K+17K)/ (17K+17K+5*60K)

                      In the new format, % for BPRNA = 17K/ 60K

                      This is because the 17K belongs to the same asset, asset_id = 2. And overall in the portfolio, all but asset_id 3 have failed.



                      Again, as I mentioned before, I have a trend chart for Quality. Quality was initially defined as

                      ROUND(([Asset_value_ tested]-[Asset_value_failed])*100/([Asset_value_ tested]),2)


                      But the updated expectation is that Quality will be calculated as

                      (Portfolio value - Asset value of assets where at least one test failed)/ Portfolio value. Where portfolio value is the asset value of all the distinct assets.


                      Referring the excel, the old Quality percent was (648K - 334K)/ 648K

                      In the new format, the Quality percent will be defined as (72K -  60K) / 72K


                      I know this is a very long and involved question. I thank you in advance for your time. Any pointers you provide will be extremely useful.