6 Replies Latest reply on Sep 27, 2016 2:15 PM by Wes Hunt

    Can you choose which calculated fields get baked into an extract query?

    Wes Hunt

      Say I create an extract directly from a SQL data source (Redshift for instance). When I look at the logs, the query performed to do the extract is exactly what I set up in my data source. Over time, I add calculated fields to the workbook. Then I re-extract again later. At this time, it appears that many (most? all?) the calculated fields I have created get added to the query used to refresh the extract. This can cause problems because sometimes the calculated fields don't play nice with the underlying data source. Is there a way to have Tableau refresh the extract using ONLY by data source configuration and not add any calculated fields to it (or select which calculated fields to bake into the query)?

       

      As an example, connecting to a Redshift data source, I'm reading a column of integers that is stored as a string. The data is dirty and some of the rows are not integers (they are blank strings). I can extract that into Tableau and create a calculate field which converts that column to an integer, and Tableau handles that fine (rows that cannot be converted are turned to NULL). But when I refresh the extract, Tableau bakes this cast to an integer into the query (lists as Calculation_######), at which point Redshift errors when it encounters a row with a blank value and tries to CAST that to an integer. The workbook and extract worked fine originally, but the mere act of a adding a calculated field post-extract causes the refresh to fail because that CAST gets injected into the refresh query.

       

      I have some creative workarounds to this that involve custom queries or using two extracts, but it seems you shouldn't have to do this. It feels weird that a first extract would work, I don't touch the data source configuration, and then subsequent refreshes fail due to calculated fields I don't even want in the refresh query.

       

      Any options here?

      Thanks,

      -Wes

        • 1. Re: Can you choose which calculated fields get baked into an extract query?
          Dmitry Chirkov

          Short answer: no.

           

          There's non-trivial logic that goes into picking which fields should be materialized (i.e. saved) in the extract vs kept as calculation definitions which in reality boils down to "if it is not tied to TODAY() or NOW() formula - stick it in".

           

          Unfortunately, we do not provide control over this.

           

          Have you tried creating a calculation that explicitly checks if value is NULL first?

          2 of 2 people found this helpful
          • 2. Re: Can you choose which calculated fields get baked into an extract query?
            Wes Hunt

            Thanks for the info! Ok, so let me go over my workarounds, I'm not sure which ones have the best tradeoffs.

             

            1. Use Custom SQL to handle the field. - This works "ok", but requires complex logic of the underlying SQL backend by users (for Redshift example, understanding of how to force the casts to work properly). Even then, it's pretty easy to think you have it all set up then create a calc field that manipulates some other random column in a non-kosher way without realizing it until the next extract, after which you're forces to edit the data source and find some other way around the issue and fix up the references to the calc field. It's all rather advanced for users.

             

            2. Create a double-extract. One extract from a "master" workbook that simply extracts the raw data with no calculated fields. Then create a new workbook that simply opens the TDE file directly as a data source and adds calculated fields on top of it. This is what I've been using, but doubles the extract storage and is nigh impossible to easily automate the update with Tableau Server because I have to update an extract then point another one to the raw TDE. This also makes it difficult to hide unused fields, and easy to mess up such that you end up with unmapped fields in the secondary workbook. It's also rather difficult to explain this concept to new users who otherwise can't understand why their workbook "that used to work fine" is suddenly failing.

             

            Do you have any other suggestions that might be less circuitous, and work withe Tableau Server with scheduled refreshes (maybe using a TDS somehow instead of just the TDE?)? Is there any feature work being done to help protect workbooks from pedantic SQL implementations like this? It seems that the extract query engine will always be more flexible than most backing DBs, so this would be a common problem as people add calc fields over time...

             

            Thanks!

            -Wes

            • 3. Re: Can you choose which calculated fields get baked into an extract query?
              Dmitry Chirkov

              Can you give me an example of calculation that errors out?

              I tried simple ones (changing type of the column directly or "INT(mycol) + 1) and it's handled without any issues via following cast: CAST(TRUNC(CAST("mytable"."mycol" as DOUBLE PRECISION)) AS BIGINT). No errors for nulls or non-number strings.

              • 4. Re: Can you choose which calculated fields get baked into an extract query?
                Wes Hunt

                  SELECT

                    StringColWithDoublePrecisionData,

                    CAST(StringColWithDoublePrecisionData AS DOUBLE PRECISION) as TimeDouble

                  FROM wh.TestData

                 

                returns this from Redshift:

                Error: [Amazon](500310) Invalid operation: Missing data for not-null field

                Details:

                -----------------------------------------------

                  error:  Missing data for not-null field

                  code:      1213

                  context:   2

                  query:     734302

                  location:  :0

                  process:   query0_74 [pid=0]

                  -----------------------------------------------;

                SQLState:  XX000

                ErrorCode: 500310

                This is due to rows with blank values in them. Adding this WHERE clause allows the query to complete, but skips the empty rows:

                  SELECT

                    StringColWithDoublePrecisionData,

                    CAST(StringColWithDoublePrecisionData AS DOUBLE PRECISION) as TimeDouble

                  FROM wh.TestData

                  WHERE LEN(StringColWithDoublePrecisionData) > 0

                By revising to a custom SQL, you can get it to work correctly:

                SELECT

                    StringColWithDoublePrecisionData,

                    CAST(NULLIF(StringColWithDoublePrecisionData,'') as DOUBLE PRECISION) as TimeDouble

                  FROM wh.TestData

                This is from a table created like so:

                 

                CREATE TABLE wh.TestData AS SELECT '1.0' as StringColWithDoublePrecisionData

                INSERT INTO wh.TestData VALUES ('2.0'), ('')

                 

                Then create a workbook that uses it. Drag the fields over to see the column with number of records on columns:

                Then create a calc field that casts to a double:

                Then drag DoubleVal to the columns tray:

                And I get this from Redshift:

                Simple filter all the empty strings from that column:

                and the chart shows correctly:

                But if you create an extract of the data, you don't need the filter -- until you refresh it, then it fails. This is the user experience. They initially create an extract, create the charts and it works fine. Then they refresh their "working views" and it fails mysteriously.

                 

                -Wes

                • 5. Re: Can you choose which calculated fields get baked into an extract query?
                  Dylan Kiernan

                  I was having the same issue so based on Dmitry's response I added "+ IF TODAY() = TODAY()then "" END" to the end of my custom string calculation to force it out of the extract refresh. Seemed to work pretty well.

                  1 of 1 people found this helpful
                  • 6. Re: Can you choose which calculated fields get baked into an extract query?
                    Wes Hunt

                    That's actually a pretty useful trick. Kind of weird, but as long as it doesn't hurt the calc performance, then seems like it would be just as good as a toggle per calc field.

                     

                    Thanks!