1 2 Previous Next 17 Replies Latest reply on Mar 3, 2014 6:30 PM by Landy Reyes

    Tableau and Infobright

    John Smith

      Hi Guys,

       

      I'm using Tableau desktop trial version against Infobright database and I'm finding big difference in performance comparing to the queries execute execute directly (using Tad for instance).

      I have a single table with aprox 250 Millions records.

       

      A simple select count(*) from T took 2min when direct query against Infobright took  lest than millisecond. 

      I experience similar results with many simple queries and group by's

      Can anybody explain this?

      Thanks

        • 1. Re: Tableau and Infobright
          Cristian Vasile

          John,

           

          Please try to track the issue with performance recording option.

          Open Tableau and then form main menu activate it.

          Create a graph, close the worksheet and then stop perf. recording, a  new viz will be created displaying a lot of useful details.

           

           

          Regards,

          Cristian.

          • 2. Re: Tableau and Infobright
            Cristian Vasile

            John,

             

            Check this thread related to Infobright and Tableau

            Infobright

             

            Regards,

            Cristian.

            • 3. Re: Tableau and Infobright
              Cristian Vasile

              John,

               

              The sql query issued by Tableau is written in Tableau's logs.

              C:\Documents and Settings\user name here\My Documents\My Tableau Repository\Logs\log.txt

               

              You could take a look and search for SELECT or QUERY words and compare Tableau's query with the one issued by your db admin tool.

               

              Regards,

              Cristian.

              • 4. Re: Tableau and Infobright
                Robert Morton

                Hi John,

                 

                Are you experiencing this performance problem with a direct connection to a single table in Tableau? Or did you paste a SQL query into the Custom SQL connection option? The latter requires Tableau to use subqueries, which MySQL and Infobright both seem to struggle with.

                 

                -Robert

                • 5. Re: Tableau and Infobright
                  John Smith

                  I did issue a direct connection to a single table and I simply drag the "# of Records" "rows" field on the top.

                  This should simply bring the total number of records in the table.

                  I used suggested tracing performance function and I discover that Tableau instead issuing a simple:

                   

                  select count(*) as sumall from T1

                   

                  it issue very convoluted query:

                   

                  select sum(1) as sumall from T1 having count(1) > 0

                   

                  The first query runs in miliseconds while the second runs 118sec

                   

                  I also noticed that simple count and group by are substituted with sum(1).

                  Can anybody explain why?

                  BTW I tried Tableau Datasource Customization (TDC) by copying tdc files but it doesn't seem to have any efect.

                  Any help would be really appreciated.

                   

                  John

                  • 6. Re: Tableau and Infobright
                    Robert Morton

                    Hi John,

                     

                    Did you try using the TDC file that I attached to the following thread?  http://community.tableau.com/thread/105047#128143

                     

                    After restarting Tableau, the TDC file should influence how Tableau generates aggregation queries when no grouping fields (dimensions) are present in the visualization. The query you posted earlier may perform poorly if it is routed through the unoptimized "compatibility mode" that Infobright offers for queries which it cannot natively handle. In contrast, Infobright does seem to handle an even more convoluted query rather efficiently which Tableau will generate once you use the TDC file, e.g. as follows:

                    select
                     1 as constexpr,
                     sum(1) as sumall
                    from T1
                    group by constexpr
                    having count(1) > 0
                    

                     

                    The reason that Tableau expresses the query in this fashion (with the having clause) is to ensure that an empty table returns zero rows, rather than a single row containing NULL. Additionally, Tableau prefers SUM(1) vs. COUNT(1) or COUNT(*) because the semantics for the COUNT aggregation vary across database platforms due to idiosyncratic shortcuts (typically for performance) that give inconsistent results in the presence of duplicate rows, unconstrained table joins (e.g. cross join), and inconsistent handling of NULL values or entirely NULL rows.

                     

                    I hope this helps,

                    Robert

                    • 7. Re: Tableau and Infobright
                      John Smith

                      Hi Robert,

                       

                      I did try to use your TDC file but after second reading of the thread, I realized I may copy the file into a wrong directory.

                      I wont be able to verify this until Monday.

                      In meantime (if you reading this before Monday), can you explain what the TDC file is suppose to achieve?

                      Thanks

                      John

                       

                      BTW The class of queries that Infobright passes to the MySQL optimizer when it cannot resolve natively shrinks constantly with any new releases.

                      Particularly it is handling well sub-queries. So I would be interested in seeing cases that you suggested were not well handled. 

                      J.

                      • 8. Re: Tableau and Infobright
                        Robert Morton

                        Hi John,

                         

                        The TDC file is intended to influence Tableau's behavior when working with databases such as Infobright which are not supported natively as first-class connections in Tableau. In this case the TDC file I linked to will force Tableau to use a different query syntax which works better with Infobright in certain circumstances.

                         

                        -Robert

                        • 9. Re: Tableau and Infobright
                          John Smith

                          Hi Robert,

                           

                          I finally copied the TDC file into the right directory and it helped with the total count when I drop "# of records" to the Rows. The answer was immediate now.

                          However when I drop a column b_name to the Columns field, tableau generated the following query:

                           

                          select b_name, sum(1)  from T1 group by b_name

                          it run 7sec

                          If Tableau had generated query:

                           

                          select b_name, COUNT(*)  from T1 group by b_name;

                           

                          it would have run in milliseconds.

                           

                          Any suggestions?

                          I'm working on the PoC for a big client and Tableau would be a great companion to Infobright if we could further effect generation of queries.

                           

                          John

                          • 10. Re: Tableau and Infobright
                            Robert Morton

                            Hi John,

                             

                            I'm glad that TDC file helped! Aside from that one option, there is nothing more you can do to control the queries issued by Tableau through its MySQL connector.

                             

                            When it comes to counting the number of records, Tableau avoids using syntax tricks such as COUNT(*) because such tricks do not reliably compute the expected value in all circumstances (table joins, nullable columns, stale indexes after record insert/delete). Additionally, COUNT(*) does not always yield the best performance, e.g. with InnoDB on MySQL. Instead, Tableau relies on the database query optimizer to safely transform SUM(1) into a semantically equivalent expression with better performance. In the example you provided, it appears that the Infobright query optimizer is unable to do so.

                             

                            As an alternative, you could write a calculated field in Tableau with an expression like the following:

                            RAWSQL_INT("COUNT(*)")

                             

                            This calculated field could be named something like "Number of Records (fast approximation)" to encourage users to take advantage of that field for quick data exploration tasks, but those users must remember to double-check their work against the original "Number of Records" field before sharing the results or drawing any substantial conclusions.

                             

                            I hope this helps,

                            Robert

                            • 11. Re: Tableau and Infobright
                              John Smith

                              It is not a matter of Infobright not being able to optimize - since it optimizes this query better than other engines. But at the same time it optimizes even better counts which are usually an Achilles' heel of other engines .

                               

                              In my example, the counts were more appropriate because it was exactly case of counting records for every occurrence of a value. In this case SUM(1) is rather artificial construct.

                               

                              But I will definitely test your suggestion of a calculated field and will let you know the result.

                              John

                              • 13. Re: Tableau and Infobright
                                John Smith

                                Thanks Guys,

                                 

                                Robert's suggestion with with calculated field actually worked very well (except I needed to use count(1) rather than count(*)).

                                Thanks

                                 

                                John

                                • 14. Re: Tableau and Infobright
                                  Suhrid Ghosh

                                  Hello Robert,

                                   

                                  I'm using Tableau desktop trial version and my data source is Infobright database . While generating report on Tableau desktop i am finding a big time gap in performance comparing to the queries executed by tableau desktop. I have a single table with 70 Millions records. I have also checked the logs in C:\Users\Administrator\Documents\My Tableau Repository\Logs to find the query generated by tableau and the same query i have executed in Infobright console and both of them (Tableau desktop and Infobright console) take equal time to fetch the data with the query generated by tableau. It takes approx. 32 seconds which i really feel is too long for 70 million records. I have tried TDC and find no difference in time by doing that.

                                   

                                   

                                  The query generated by tableau is :

                                   

                                  SELECT YEAR(`TAB_1`.`REQ_DATETIME`) AS `yr_REQ_DATETIME_ok`, COUNT(`TAB_1`.`UNIQ_ID`) AS `cnt_Calculation_9090603105745843_qk`, SUM(`TAB_1`.`AMOUNT`) AS `sum_AMOUNT_qk` FROM `TAB_1` INNER JOIN `ACT_TYPE` ON (`TAB_1`.`ACT_ID` = `ACT_TYPE`.`ACT_ID`) INNER JOIN `CD_TYPE` ON (`TAB_1`.`CD_ID` = `CD_TYPE`.`CD_ID`) INNER JOIN `SER_PROV` ON (`TAB_1`.`SP_ID` = `SER_PROV`.`SP_ID`) GROUP BY 1

                                   

                                  Can you please advice on the same

                                  1 2 Previous Next