11 Replies Latest reply on Aug 19, 2013 12:14 AM by Ville Tyrväinen

    Import a large TSV dataset into Tableau

    alessandro.scocciapappagallo

      I have a problem: I need to visualise data stored in a single 2GB .tsv file. Unfortunately Tableau seems not to read this format; I tried to change the extension to .txt and in this way Tableau can apparently read the file. However, the importing procedure goes on ad infinitum. There is any quicker way to work with TSV files in Tableau, so that I can visualise the data without parsing and putting them into a database (operation I would rather not to do)?

        • 1. Re: Import a large TSV dataset into Tableau
          Dan Cory

          Tableau should be able to create an extract from your file. I don't know why it's failing, but figuring that out is your best bet. Does it work with a small fraction of the file? Can you post some of the file if not?

           

          Thanks,

          Dan

          • 2. Re: Import a large TSV dataset into Tableau
            Allan Walker

            Is this a tab separated value file?  If it is, it's probably a limitation of the Microsoft Jet Engine.

             

            There are also issues around internationalization/ASCII if I remember with Jet.

             

            I'd suggest importing into a dB (MySQL, PostgreSQL) and using an alternative driver.

            • 3. Re: Import a large TSV dataset into Tableau
              Cristian Vasile

              Alessandro,

               

              You could try some tools to check your huge and maybe faulty text file like:

              Google refine
              Data wrangler

              CSVkit

              Microsoft Data Explorer

               

              Regards,

              Cristian

              • 4. Re: Import a large TSV dataset into Tableau
                alessandro.scocciapappagallo

                The dataset is available from here.

                 

                I thought the same Dan Cory but after three hours of processing I realised it was not going to happen. And unfortunately, I have no idea of the reason. Allan Walker, I should parse the file in order to put it into a database. I would rather not to do it, it is a quite time-consuming operation. Cristian Vasile, thank you for the list; I did not know most of those software. I would try to have a look at them.

                 

                If you have other advice, suggestions or solutions, please, feel free to share.

                • 5. Re: Import a large TSV dataset into Tableau
                  Richard Leeke

                  Tableau has two different mechanisms that it may use when importing a text file.

                   

                  The way it always used to work is to use the Microsoft JET engine for reading the file. There are a number of limitations with that, including that it is slow, it can't read files larger than 2 GB and it has some nasty habits like silently ignoring (nulling) data that doesn't comply with it's guess about data types.

                   

                  Back in about version 6.1 days Tableau introduced a new, fast data loader for reading text files. As the name suggests it is a lot faster than JET and it also doesn't suffer from the 2 GB limitation. But there are various restrictions on the circumstances in which it can be used. Off the top of my head the ones I remember are that the datasource must be a single file (no support for joins or custom SQL) and the datasource can't have had any calculated fields added. I suspect that you can't even override data types, but I might be wrong on that.

                   

                  The easiest way to tell if it is using the fast data loader or not is to look in the log files in your Tableau repository. With a bit of hunting through the file to find the time when you started the export you should find some lines that say which loader it's using. It's also worth looking to see if there are any error messages or messages giving any indication of progress. It's a year or two since I looked at this in any detail but I think I recall that it makes encouraging noises as it goes (n thousand rows loaded in so many seconds, or some such). I think!

                   

                  If it says that it is using the fast data loader but it fails to load 20 million simple rows of 6 columns in 3 hours then I'd say there is certainly something wrong.

                  • 6. Re: Re: Import a large TSV dataset into Tableau
                    Dan Cory

                    Looking at the logs while connecting to the file shows that Tableau is running a query to determine the width of the column. The most direct solution is to create a schema.ini that defines the columns in the file. I created one and attached it.

                     

                    Dan

                    1 of 1 people found this helpful
                    • 7. Re: Import a large TSV dataset into Tableau
                      Ville Tyrväinen

                      I tested that .tsv file with my laptop and even it took quite many hours, Tableau was able to read the file (it took maybe 6-10 hours). I decided to use that schema file that Dan Cory posted earlier and let Tableau read the file again( it was faster but don't know how much). Then I imported all data into Tableau and tested how much my laptop is able to calculate.

                       

                      Data from: www.last.fm

                      Test.JPG.jpg

                      1 of 1 people found this helpful
                      • 8. Re: Import a large TSV dataset into Tableau
                        Ville Tyrväinen

                        Tested again and with that schema file it took actually only 15 min. Nice job Dan Cory!

                         

                        Edit:

                         

                        Easy solution:

                        1. Change .tsv to .csv

                        2. Connect text file

                        3. Change "The first row has field names in it" to "Automatically generate names for the fields"

                        4. Field separator: "Tab"

                        5. Instant connection to the data source and about 15 min to import all data

                        1 of 1 people found this helpful
                        • 9. Re: Import a large TSV dataset into Tableau
                          alessandro.scocciapappagallo

                          This is fantastic, Ville Tyrväinen! With your procedure I was able to connect in few minutes, exactly what I was searching for! Only one problem: did you try to query the data? How long did it take? Because on my machine, an ASUS quad-core i7 with 16GB of RAM is taking ages to run the simplest queries possible. There is any way to speed up this process? I have to run 5-6 queries on the data, I definitely need something quicker.

                          • 10. Re: Re: Import a large TSV dataset into Tableau
                            Cristian Vasile

                            Alessandro,

                             

                            Your computer is extremely powerful, more than adequate to query 19M records, I hope that you did import it in a .TDE (tableau data engine) database and did not chose connect live.

                             

                            If you want to live on the edge, - i don't recommend on your case -  try to create a RAM disk and work from that disk, but you should be very rigorous and before shutdown don't forget to save on a real disk your work.

                             

                            Regards,

                            Cristian

                            • 11. Re: Import a large TSV dataset into Tableau
                              Ville Tyrväinen

                              Yeah I tried. Some queries took 5s, some 7 min and of course there were cases when my laptop wasn't able to calculate query in less than hour. Tried also with faster computer (i7 and 8 GB) and for example calculating Number of records for each weekday took 4,5 min. For some reason Tableau went out of memory if I tried to show all track-names. It just calculated it a couple of seconds. So I think it depends on the level of detail that you want to show...