6 Replies Latest reply on Jul 26, 2018 2:00 PM by Steph Swierenga

    Incorrect results in Tableau 2018.1 extract data source




      I'm using a view from a MySQL database containing about 150 million rows as my data source. Because of the size of the number of rows and the fact that the data will never be updated, I'd like to create an extract in Tableau 2018.1 to improve performance. In the extract, I've selected all rows and the aggregate data for visible dimensions option, and I have set my default aggregation for all measures of interest to average (the data do not have a time dimension). When I create the extract, I am warned the extracted data may cause the views to be different than they are under the live connection because I'm using the average for aggregation and have a few calculated fields and one bin dimension.


      I'm not getting the expected results when I use the extract. If I run the extract and live data sources for a smaller subset of the data, I do not get the same results. If I write a query to obtain equivalent aggregations directly in SQL, I get the same result as given by the live data and not the extract. Interestingly, the incorrect values only appear to affect the values of one measure and not the others (I've not compared the exact numbers, but they make sense). I've tried deleting the extract file and starting over several times, but this doesn't work. I've also tried recreating the view in the SQL database, but this doesn't help, either.


      Am I having issues related to the warning I was presented when creating my extract? If so, is there anyway to create an aggregated extract using the average without issue?


      Any help would be tremendously appreciated.


      Have a nice day!


        • 1. Re: Incorrect results in Tableau 2018.1 extract data source
          Deepak Rai

          Difficult to Guess What is going on without some sample.

          • 2. Re: Incorrect results in Tableau 2018.1 extract data source



            I'm sorry I'm unable to provide a sample. If you have a suggestion for replicating this setup so I can share it, I would be happy to hear them.



            • 3. Re: Incorrect results in Tableau 2018.1 extract data source
              Chris McClellan

              If you create the extract without aggregation can you do the right calculation ?


              I've never seen this issue, but I think it might be a case of what you expect to happen is not what Tableau does.

              • 4. Re: Incorrect results in Tableau 2018.1 extract data source



                I just finished creating an extract without the aggregation option. The views are exactly as expected, so there must be something about this checkbox I don't understand.


                Thanks for the suggestion!



                • 5. Re: Incorrect results in Tableau 2018.1 extract data source
                  Chris McClellan

                  Awesome    at least it's working in some regard.


                  Now the average and aggregation option means that it will summarise your data and average the values on aggregation while it's creating the extract.


                  Without seeing both the SQL and the extract, I'd guess that your SQL is not aggregating the data and then averaging whereas your extract is.


                  Depending on performance, you might just leave it with no aggregation or play with it more to fully understand what's really going on.

                  • 6. Re: Incorrect results in Tableau 2018.1 extract data source
                    Steph Swierenga

                    so here's what i think may be going on - i have a similar problem. The default aggregation set for a measure can't actually be overridden by going in and changing the default, i.e., if you change the default measure of a numeric field from SUM, to AVG, it won't be respected by the extract process and you'll get the message warning you that you might get strange results. Indeed, i've noticed that the aggregation reverts to the true default, and if you change to some non-additive measure like AVG, you'll just get the average of the post-extract loaded rows, if that makes sense.  Everything works correctly if you don't check the 'aggregate data for visible dimensions' box since you're getting the entire set of rows, so all forms of aggregation can be performed safely. sadly, this also means you're not getting any compression in the extract file. 


                    I'm still a little unclear as to why this wouldn't work if say, every permutation of dimension intersections could be pre-calculated and added to the extract, meaning one could safely arrange any set of dimensions and get proper answers for non-additive aggregations like the AVG. I guess that would quickly become combinatorially  explosive to do this. it's more conceivable that i there's some other constraint i'm not considering. In any case, it seems like the safe thing to do is either go with a full extract, or, if you are reasonably assured that your workbook users aren't going to use anything other than the default aggregations, then go with the aggregated extract.