7 Replies Latest reply on Sep 23, 2013 11:48 PM by pooja khandelwal

    Loading metada in Hive because of custom queries?

    .Ryan Christo

      We run our Hadoop Hive configuration in strict mode, which requires us to specify a partition predicate. Because of this, all Hive connections we make in Tableau must be run using a custom HQL query. Every time we refresh an extract, Tableau wraps the query in it's own query to get the metada first (the "Loading metada" window), then runs the custom query second, which is extremely inefficient for us. Is this happening because we are running in strict mode and forced to use custom queries? If we remove strict mode and connect to tables from the list, does this remove the need for Tableau to run it's own query first to get metadata?

        • 1. Re: Loading metada in Hive because of custom queries?
          Shawn Wallwork

          What you didn't participate in Movember? Great stash Rchristo! I'll make sure you post gets attn. after holidays.



          • 2. Re: Loading metada in Hive because of custom queries?
            Robert Morton

            Hi Ryan,


            Tableau should be querying for metadata using a predicate that makes the query fast to execute, such as 'WHERE 1=0', so that we quickly get back complete metadata for a zero-row result set. However the Hive query compiler is still a work in progress, and many optimizations are not completely implemented or completely understood in terms of their interaction with other query transformations. For example, Hive may perform poorly with the metadata technique Tableau uses if Predicate Pushdown is disabled (via hive.optimize.ppd).


            Consider reviewing your default Hive settings, which you can override using the Initial SQL option in the Hive connection dialog. If you continue to experience performance problems, please contact Tableau Support so they can work with you to understand your configuration, collect logs, and determine if there is a bug and if there are any workarounds.


            I hope this helps,


            • 3. Re: Loading metada in Hive because of custom queries?
              .Ryan Christo

              Thanks for the explanation, Robert. I think our issue is that we mostly use Hive connections for our bigger and/or more complex queries that MySQL can't handle well, so it is not easy for Tableau to insert a 'WHERE 1=0' and have it work well. Usually these consist of nested tables and lots of joins. Since there doesn't seem to be an easy answer, we'll revisit this in the new year and possibly contact Tableau Support like you suggested.

              • 4. Re: Loading metada in Hive because of custom queries?
                Venkat Jayavardhan

                Hi Ryan,
                Were you able to find solution for this, I am using LIVE data connection to pull data into my workbook, and the back end DB is Cassandra, which is using HIVE ODBC connector.


                while trying to add filters its showing the following error:


                [DataStax][Hardy] (22) Error from ThriftHiveClient: Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: 4:9 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token

                Can you help us in eliminating this error?


                • 5. Re: Loading metada in Hive because of custom queries?
                  pooja khandelwal

                  Hi Venkat,


                  I am facing the same issue. If you have found the solution, please share.




                  • 6. Re: Loading metada in Hive because of custom queries?
                    Venkat Jayavardhan

                    Hi Pooja,


                    This error is due to restriction on the table you are trying to access, when we were using the table it has been set with a LIMIT clause, which does not allow us to query the whole table due to performance Issue. and we have learned from the Tableau Consultants that Cassandra is not a good Db to go with Tableau.


                    there are two thing I can suggest from my experience

                    1: To create a server level extract and refresh the extract(schedule) based on your requirement. and use the data from the server level extract for your reporting/visualization.

                    -- We followed the above approach, and the performance was good compared to LIVE connection.

                    2: There is news that DATASTAX is going to come with a new version of DB connector to interact well between Cassandra and Tableau.