2 Replies Latest reply on Nov 26, 2013 12:15 PM by bharathwaj.nandakumar

    Importing several gigs of data from mysql


      I have been trying to analyse a massive amount of data stored in a mysql database. Each of the tables (can have more than 500k rows) have a reference to an id which is a primary key of one of tables. As I try joining more than 3 of the tables, the extracting of data is taking forever i.e. 2 hours and running.


      I was wondering if there is any best practices for importing huge amount data through Tableau desktop. I saw the post (http://community.tableau.com/thread/133936) which mentioned that using custom sql can be slower than importing entire tables.


      Also, I just wished that the import of the data can happen in batches or in the background so that I can start doing some analysis as the data comes in.Is that possible in Tableau?

        • 1. Re: Importing several gigs of data from mysql
          Joshua Milligan



          The major concern in using Custom SQL is when you are connecting live.  Then every time you drag a field to a new location in the view, a new query is gernated and run against the live connection -- but your custom SQL had to wrapped in that query vs. Tableau knowing the metadata and being able to optimize the queries.


          When performing an extract, there really won't be any performance difference between custom SQL and joining tables in Tableau when building views.  The custom SQL might cause an extract to take longer to build if it is more complex than simple joins.



          If you were to run your query in a query editor, how many rows does it return?  How long does it take?  Is it possible that your joins are turning 500k rows into much more than you expect?




          • 2. Re: Importing several gigs of data from mysql



            When I run through the query browser, I think it does go beyond a few million rows and mysql query browser just runs out of memory for its results. So since I get too many results, I can always put a limit on the total results returned. So now I have the following questions.


            If I m limiting the results to say 500k rows and use a live connection, if I have a date range quick filter, will the quick filter just work on the 500k rows which I have got through the query, or will it use the quick filter's settings and try to get me 500k results. I want to ensure that I have a maximum data set for my analysis.