5 Replies Latest reply on Jun 3, 2013 9:33 AM by Imran Akbar

    slow GROUP BY in query

    Imran Akbar

      One of our scientists is trying to plot some data, but the query is taking > 20 minutes to execute, at which point he just gives up. I tried to save an extract, which downloaded 1 million rows fairly quickly (with a filter), but then Tableau keeps issuing additional queries (to get unique values for each column - none of which have indexes) which each take > 20 minutes each, so I can never finish the extract and create a packaged workbook to send to Tableau support.

       

      The query it is trying to execute is:

      SELECT `epi_flash_meas_values`.`I` AS `none_I_qk`,

        `epi_flash_meas_values`.`V` AS `none_V_qk`,

        `epi_flash_meas`.`corrected_irradiance_wcm_sq` AS `none_corrected_irradiance_wcm_sq_qk`,

        `epi_flash_meas`.`fill_factor` AS `none_fill_factor_qk`,

        `epi_flash_meas`.`flash_meas_id` AS `none_flash_meas_id_ok`,

        `epi_flash_meas`.`flash_setup` AS `none_flash_setup_nk`,

        `epi_flash_meas`.`wafer_name` AS `none_wafer_name_nk`,

        `epi_flash_meas`.`x_coord` AS `none_x_coord_ok`,

        `epi_flash_meas`.`y_coord` AS `none_y_coord_ok`

      FROM `epi_flash_meas`

        INNER JOIN `epi_flash_meas_values` ON (`epi_flash_meas`.`flash_meas_id` = `epi_flash_meas_values`.`flash_meas_id`)

      WHERE (`epi_flash_meas`.`wafer_name` IN ('D-70-1-A', 'D-70-2-A', 'D-70-3-A', 'D-70-4-A'))

      GROUP BY 1,2,3,4,5,6,7,8,9

       

      Indexes exist for wafer_name and flash_meas_id.

       

      The database EXPLAIN for this query is:

      *************************** 1. row ***************************

        id: 1

        select_type: SIMPLE

        table: epi_flash_meas

        type: range

      possible_keys: PRIMARY,wafer_name

        key: wafer_name

        key_len: 767

        ref: NULL

        rows: 2362

        Extra: Using index condition; Using temporary; Using filesort

      *************************** 2. row ***************************

        id: 1

        select_type: SIMPLE

        table: epi_flash_meas_values

        type: ref

      possible_keys: flash_meas_id

        key: flash_meas_id

        key_len: 4

        ref: epi.epi_flash_meas.flash_meas_id

        rows: 130

        Extra: NULL

      2 rows in set (0.00 sec)

       

      While the query is running, the database reports that it is "creating sort index"

       

      I believe the slowdown is due to the GROUP BY at the end of the query, for which no index exists.  When I remove the GROUP BY I get results back (about a million rows) within seconds.

       

      The workbook has 'x_coord' and 'V' as the columns, and 'y_coord' and 'I' as the rows.

       

      Any suggestions?

       

      thanks,

      imran

        • 1. Re: slow GROUP BY in query
          Robert Morton

          Hi Imran,

           

          Since you are not aggregating any of the underlying data, I suggest you un-check the menu item Analysis >> Aggregate Measures.

           

          This is an uncommon operation for Tableau users. The reason is that a person's visual perception system is unable to digest millions of individual data points, especially when attempting to identify outliers or interesting trends. Consider the challenge of trying to determine if a solar cell fill factor correlates with its x/y position on a wafer. Some semiconductor fabrication processes produce some distortion at the edges of a wafer due to the lithographic process. It would be very time consuming for a person to examine millions of records to find such a pattern. Instead you could investigate this possible effect by looking at the average and variance of the fill factor across all wafers produced, broken down by the location of each cell in the wafer. This might look like the following query:

           

          SELECT

            `epi_flash_meas`.`x_coord` AS `none_x_coord_ok`,

            `epi_flash_meas`.`y_coord` AS `none_y_coord_ok`,

            AVG(`epi_flash_meas`.`fill_factor`) AS `avg_fill_factor_qk`,

            VARIANCE(`epi_flash_meas`.`fill_factor`) AS `var_fill_factor_qk`

          FROM `epi_flash_meas`

            INNER JOIN `epi_flash_meas_values` ON (`epi_flash_meas`.`flash_meas_id` = `epi_flash_meas_values`.`flash_meas_id`)

          WHERE (`epi_flash_meas`.`wafer_name` IN ('D-70-1-A', 'D-70-2-A', 'D-70-3-A', 'D-70-4-A'))

          GROUP BY 1,2

           

          As you identify interesting outliers or patterns, you can drill down to the details by filtering your visualization to keep only the data points which collectively comprise the data in each wafer region you find interesting. Once filtered to your region of interest, you can look for further patterns, such as fill factor variations by wafer name, or perhaps by the configuration parameters of the flash / irradiation experiments themselves. These queries should also produce results aggregated to some coarse granularity that the human perception system can quickly reason about. When you finally drill down (filter) to a very specific criteria, only then does it make sense to display the complete, unaggregated view of each record of data matching that criteria.

           

          Does this make sense?

          -Robert

          1 of 1 people found this helpful
          • 2. Re: slow GROUP BY in query
            Toby Erkson

            Robert, ya lost me (not on the wafer stuff, I've experienced that work before).  My recommendation to the OP would be to remove the GROUP BY clause and let Tableau do that heavy lifting when creating a viz.  Typically there should be no need for pre-aggregation, GROUP BY, ORDER BY, etc. as Tableau is going to scramble it anyways when the consumer (end user) creates visualizations upon the data.

            • 3. Re: slow GROUP BY in query
              Robert Morton

              Hi Toby,

              I read the original question as having to do with the query Tableau generates. It does not look like Custom SQL, so the query must have been generated by a visualization that contained nothing but dimensions. In that case, disabling aggregation will result in queries with no GROUP BY clauses.

              -Robert

              • 4. Re: slow GROUP BY in query
                Toby Erkson

                Ah, gotcha, now I understand

                • 5. Re: slow GROUP BY in query
                  Imran Akbar

                  Robert,

                      thank you for the AMAZING reply.  It was super helpful (much more so than the 'official' Tableau support).

                  Yes, the query was generated by Tableau; it wasn't custom SQL.

                  As soon as I unchecked "Aggregate Measures" I got the results within seconds.  I still need to see if that's how the scientist wanted to visualize the data, though - I'll suggest the alternate query to him and see if that'll be useful.

                   

                  yours,

                  imran