-
-
-
3. Re: Tableau and Infobright
Cristian Vasile May 23, 2013 10:36 AM (in response to John Smith)John,
The sql query issued by Tableau is written in Tableau's logs.
C:\Documents and Settings\user name here\My Documents\My Tableau Repository\Logs\log.txt
You could take a look and search for SELECT or QUERY words and compare Tableau's query with the one issued by your db admin tool.
Regards,
Cristian.
-
4. Re: Tableau and Infobright
Robert MortonMay 23, 2013 5:59 PM (in response to John Smith)
Hi John,
Are you experiencing this performance problem with a direct connection to a single table in Tableau? Or did you paste a SQL query into the Custom SQL connection option? The latter requires Tableau to use subqueries, which MySQL and Infobright both seem to struggle with.
-Robert
-
5. Re: Tableau and Infobright
John Smith May 24, 2013 1:17 PM (in response to Robert Morton)I did issue a direct connection to a single table and I simply drag the "# of Records" "rows" field on the top.
This should simply bring the total number of records in the table.
I used suggested tracing performance function and I discover that Tableau instead issuing a simple:
select count(*) as sumall from T1
it issue very convoluted query:
select sum(1) as sumall from T1 having count(1) > 0
The first query runs in miliseconds while the second runs 118sec
I also noticed that simple count and group by are substituted with sum(1).
Can anybody explain why?
BTW I tried Tableau Datasource Customization (TDC) by copying tdc files but it doesn't seem to have any efect.
Any help would be really appreciated.
John
-
6. Re: Tableau and Infobright
Robert MortonMay 24, 2013 1:49 PM (in response to John Smith)
Hi John,
Did you try using the TDC file that I attached to the following thread? http://community.tableau.com/thread/105047#128143
After restarting Tableau, the TDC file should influence how Tableau generates aggregation queries when no grouping fields (dimensions) are present in the visualization. The query you posted earlier may perform poorly if it is routed through the unoptimized "compatibility mode" that Infobright offers for queries which it cannot natively handle. In contrast, Infobright does seem to handle an even more convoluted query rather efficiently which Tableau will generate once you use the TDC file, e.g. as follows:
select 1 as constexpr, sum(1) as sumall from T1 group by constexpr having count(1) > 0
The reason that Tableau expresses the query in this fashion (with the having clause) is to ensure that an empty table returns zero rows, rather than a single row containing NULL. Additionally, Tableau prefers SUM(1) vs. COUNT(1) or COUNT(*) because the semantics for the COUNT aggregation vary across database platforms due to idiosyncratic shortcuts (typically for performance) that give inconsistent results in the presence of duplicate rows, unconstrained table joins (e.g. cross join), and inconsistent handling of NULL values or entirely NULL rows.
I hope this helps,
Robert
-
7. Re: Tableau and Infobright
John Smith May 24, 2013 8:23 PM (in response to Robert Morton)Hi Robert,
I did try to use your TDC file but after second reading of the thread, I realized I may copy the file into a wrong directory.
I wont be able to verify this until Monday.
In meantime (if you reading this before Monday), can you explain what the TDC file is suppose to achieve?
Thanks
John
BTW The class of queries that Infobright passes to the MySQL optimizer when it cannot resolve natively shrinks constantly with any new releases.
Particularly it is handling well sub-queries. So I would be interested in seeing cases that you suggested were not well handled.
J.
-
8. Re: Tableau and Infobright
Robert MortonMay 24, 2013 8:24 PM (in response to John Smith)
Hi John,
The TDC file is intended to influence Tableau's behavior when working with databases such as Infobright which are not supported natively as first-class connections in Tableau. In this case the TDC file I linked to will force Tableau to use a different query syntax which works better with Infobright in certain circumstances.
-Robert
-
9. Re: Tableau and Infobright
John Smith May 27, 2013 8:07 AM (in response to Robert Morton)Hi Robert,
I finally copied the TDC file into the right directory and it helped with the total count when I drop "# of records" to the Rows. The answer was immediate now.
However when I drop a column b_name to the Columns field, tableau generated the following query:
select b_name, sum(1) from T1 group by b_name
it run 7sec
If Tableau had generated query:
select b_name, COUNT(*) from T1 group by b_name;
it would have run in milliseconds.
Any suggestions?
I'm working on the PoC for a big client and Tableau would be a great companion to Infobright if we could further effect generation of queries.
John
-
10. Re: Tableau and Infobright
Robert MortonMay 27, 2013 9:57 AM (in response to John Smith)
Hi John,
I'm glad that TDC file helped! Aside from that one option, there is nothing more you can do to control the queries issued by Tableau through its MySQL connector.
When it comes to counting the number of records, Tableau avoids using syntax tricks such as COUNT(*) because such tricks do not reliably compute the expected value in all circumstances (table joins, nullable columns, stale indexes after record insert/delete). Additionally, COUNT(*) does not always yield the best performance, e.g. with InnoDB on MySQL. Instead, Tableau relies on the database query optimizer to safely transform SUM(1) into a semantically equivalent expression with better performance. In the example you provided, it appears that the Infobright query optimizer is unable to do so.
As an alternative, you could write a calculated field in Tableau with an expression like the following:
RAWSQL_INT("COUNT(*)")
This calculated field could be named something like "Number of Records (fast approximation)" to encourage users to take advantage of that field for quick data exploration tasks, but those users must remember to double-check their work against the original "Number of Records" field before sharing the results or drawing any substantial conclusions.
I hope this helps,
Robert
-
11. Re: Tableau and Infobright
John Smith May 27, 2013 10:29 AM (in response to Robert Morton)It is not a matter of Infobright not being able to optimize - since it optimizes this query better than other engines. But at the same time it optimizes even better counts which are usually an Achilles' heel of other engines .
In my example, the counts were more appropriate because it was exactly case of counting records for every occurrence of a value. In this case SUM(1) is rather artificial construct.
But I will definitely test your suggestion of a calculated field and will let you know the result.
John
-
-
13. Re: Tableau and Infobright
John Smith May 27, 2013 12:03 PM (in response to Cristian Vasile)Thanks Guys,
Robert's suggestion with with calculated field actually worked very well (except I needed to use count(1) rather than count(*)).
Thanks
John
-
14. Re: Tableau and Infobright
Suhrid Ghosh Jun 4, 2013 10:01 PM (in response to Robert Morton)Hello Robert,
I'm using Tableau desktop trial version and my data source is Infobright database . While generating report on Tableau desktop i am finding a big time gap in performance comparing to the queries executed by tableau desktop. I have a single table with 70 Millions records. I have also checked the logs in C:\Users\Administrator\Documents\My Tableau Repository\Logs to find the query generated by tableau and the same query i have executed in Infobright console and both of them (Tableau desktop and Infobright console) take equal time to fetch the data with the query generated by tableau. It takes approx. 32 seconds which i really feel is too long for 70 million records. I have tried TDC and find no difference in time by doing that.
The query generated by tableau is :
SELECT YEAR(`TAB_1`.`REQ_DATETIME`) AS `yr_REQ_DATETIME_ok`, COUNT(`TAB_1`.`UNIQ_ID`) AS `cnt_Calculation_9090603105745843_qk`, SUM(`TAB_1`.`AMOUNT`) AS `sum_AMOUNT_qk` FROM `TAB_1` INNER JOIN `ACT_TYPE` ON (`TAB_1`.`ACT_ID` = `ACT_TYPE`.`ACT_ID`) INNER JOIN `CD_TYPE` ON (`TAB_1`.`CD_ID` = `CD_TYPE`.`CD_ID`) INNER JOIN `SER_PROV` ON (`TAB_1`.`SP_ID` = `SER_PROV`.`SP_ID`) GROUP BY 1
Can you please advice on the same
-
PerQueryInfoBright.JPG.jpg 135.0 KB
-