4 Replies Latest reply on Jan 15, 2015 2:43 PM by john ripper

    Tableau+Greenplum and the 0 <> 0 queries

    Tamas Foldi

      My problem is that the guys who wrote the Greenplum support for Tableau are not really aware of the Greenplum's optimizer, therefore in case when Tableau does not expect results from the database, it puts "0 <> 0" at the end of query's where section. The problem is, that greenplum will perform some parts of the queries and applies the 0 <> 0 checks after some sequential scans. Thus, I have an action filter operated dashboards which takes 20 seconds to load and provides 0 records (due to this 0 <> 0 condition).

       

      I would suggest one of these options:

      1. Do not execute queries if Tableau does not expect return rows This would be the best, but yes, maybe due to generalization or metadata handling tableau guys decided this behavior and want to keep it.
      2. Check the underlying database's optimizer to define how to "skip all rows", since the 0 <> 0 not working on all databases.

       

      Nevertheless, if in the next releases it is possible, please review how the underlying database engines are working, since sometimes the optimizers are not aware of Tableau's will.

       

      Anyway, is there any workaround that I could try?

        • 1. Re: Tableau+Greenplum and the 0 <> 0 queries
          Robert Morton

          Hi Tamas,

           

          Thanks for this feedback. Your observation about general metadata support is correct, Tableau needs to receive a result set of some form, even if empty, in order to determine the metadata for the fields in play. This is most commonly seen with connections which use Custom SQL, but it may be necessary for certain other types of metadata retrieval. In the future we may be able to retrieve metadata through a prepared statement alone, avoiding the partial execution that some databases seem to perform. However I cannot commit to a specific timeframe for this change.

           

          As for tailoring to the limitations of the underlying database optimizer, this is a challenge given the numerous databases (and different versions of each) that Tableau supports and it would require that we develop a detailed model of the behavior of systems which we do not control, and which are mostly closed-source. Furthermore, the database itself is the best suited to performing optimization of queries, since it can reason about the shape of the underlying data (cardinality, statistics, etc.).

           

          A good workaround for a Master / Detail view such as your dashboard driven by action filters is to create Data Engine extracts for the summary data, and use a live connection for the filtered detail data. This is especially useful for quickly loading the dashboard when no filter criteria have been selected yet. Be sure to make your filter actions default to setting the target worksheet to an empty viz when nothing is selected.

           

          I hope this helps,

          Robert

          1 of 1 people found this helpful
          • 2. Re: Tableau+Greenplum and the 0 <> 0 queries
            Tamas Foldi

            ableau needs to receive a result set of some form, even if empty, in order to determine the metadata for the fields in play. This is most commonly seen with connections which use Custom SQL, but it may be necessary for certain other types of metadata retrieval. In the future we may be able to retrieve metadata through a prepared statement alone, avoiding the partial execution that some databases seem to perform.

             

            There are two problems with this approach:

             

            1. I suppose the metadata retrieval (at least those type of metadata which is accessible thru executed/prepared statements) are available during design time. During "presentation" time the tool should focus only on data retrieval for performance reasons.
            2. The current filter and metadata caching mechanism as I understood are limited to the vizql sessions, thus, if I use exactly the same data query, then the metadata and list of values (all db values) are queried again and again.

             

            As for tailoring to the limitations of the underlying database optimizer, this is a challenge given the numerous databases (and different versions of each) that Tableau supports and it would require that we develop a detailed model of the behavior of systems which we do not control, and which are mostly closed-source.

             

            It is true, however, Tableau already has a database template engine for generating database dependent queries. The date handling, on the fly temporary table creation and many other things are implemented per database type basis, so I guess this one could fit into the existing framework. On the other hand I understand the risks and the extra efforts with this.

             

            A good workaround for a Master / Detail view such as your dashboard driven by action filters is to create Data Engine extracts for the summary data, and use a live connection for the filtered detail data.

             

            In my case all dashboards and sheets are using the same data connection, which consists about 6 billion rows / month, and users only see their own data. There were attempts to use tableau extracts but to be honest usually there were two results: the extraction died due to the amount of data or the visualization results with extracts were different with live data.

             

            My customer decided to purchase tableau server due to performance reasons. But after we just ran into several limitations we just started to use more custom code and less tableau, like to rewrite the whole filtering to provide cached list of values across dashboards, custom queries for quick filters, provide scrollable, foldable custom filter sections with up to 30-40 filters and so on.

             

            If Tableau really wants be a good player in big data, these kind of optimizations should be available in the further releases.

            • 3. Re: Tableau+Greenplum and the 0 <> 0 queries
              Robert Morton

              Thanks Tamas, I appreciate this feedback!

              -Robert

              • 4. Re: Tableau+Greenplum and the 0 <> 0 queries
                john ripper

                Hi Guys, I'm currently connecting my tableau workbooks to a green plum database. the problem here is, it keeps going on forever in loading metadata. Should I be asking my data team about it? I'm quite not sure where to start with.

                Any lead here shall be much appreciated.

                thanks!