4 Replies Latest reply on Nov 30, 2018 6:23 AM by Michael Gillespie

    tableau performance with multiple live connections?

    Blaine Elliott

      I'm trying to understand how Tableau performs when using multiple live connections.

       

      Consider a situation where Tableau has a workbook.  That workbook sources two live connections, one to a large hadoop cluster and another to a small mysql instance.  The workbook does something like:

       

      SELECT mysql_table.username, SUM(hadoop_table.revenue) AS sum_revenue

      FROM hadoop_table -- 100M rows

      INNER JOIN mysql_table -- 1M rows

      ON mysql_table.userid = hadoop_table.userid

      GROUP BY mysql_table.username

       

      What's the performance for this operation for Tableau, hadoop and mysql?  Is Tableau going to load hadoop_table and mysql_table intto Tableau, then do the JOIN operation?  Is Tableau smart enough to use predicate pushdowns and table scans?  Something else?