10 Replies Latest reply on Jul 19, 2013 1:20 AM by Jim Wahl

    Dealing with large amounts of data is becoming painful

    Tim Uckun

      I have several reports dealing with millions of rows of data and it's becoming very painful to work in Tableau.  The data is extracted into Tableau once a day from a view I have created for these reports. Every time I want to make a modification to a report I have to download a 30 megabyte report. If I make a modification to my view (add a field in most cases) I have to edit the data for the report which causes the report import millions of rows of data into the report.

       

      Is there any way to tell tableau to work with a subset of the data for design purposes? For example just last month's or weeks data? The data size is only going to grow so this problem is only going to get worse as time goes on.

       

      What's a good strategy for dealing large(er) amounts of data?

        • 1. Re: Dealing with large amounts of data is becoming painful
          Joshua Milligan

          Tim,

           

          You can always limit an extract to the top N rows.  That can limit the amount of data for test and design purposes.  Or, as you mentioned, you might add a filter when extracting to get a limited range of data.  Hiding all the unused fields can also dramatically reduce the size of extracts.

           

          Other options might include having a test version of the view which limits the data set.  Ultimately, if your data is sufficiently large, you might want to consider how it is structured.  You can get great performance using a live connection with properly indexed, partitioned, and referenced star-schema structures where you allow Tableau to do the joins (and thus utilize the metadata to optimize the SQL).

           

          Regards,

          Joshua

           

           

          Extract.png

          • 2. Re: Dealing with large amounts of data is becoming painful
            Tim Uckun

            Is it possible to change the extract rules for "design" and "production" environments? Obviously I want to use the full data for reporting.

             

            Also how can I get a list of fields which are not being used in any of the charts in the workbook?

            • 3. Re: Dealing with large amounts of data is becoming painful
              Tim Uckun

              I tried mocking creating a new report with live connection and it's cumbersome to try and design reports while connected to the live database.

               

              Is there a way to connect to a copy of the database on my machine for designing but have the database connect to production database when it's published?

               

              I suppose DNS trickery might work but hopefully there is a more elegant solution.

              • 4. Re: Dealing with large amounts of data is becoming painful
                Jim Wahl

                What about editing the data connection (right-click > edit connection) to point to your local DB / production DB.

                 

                You've probably tried this, but I'm curious if it doesn't work.

                 

                Jim

                • 5. Re: Dealing with large amounts of data is becoming painful
                  Dana Withers

                  Hi Tim,

                   

                  We have a Live site for users and a Test site for development where copies of the workbooks are the same (easier for proper testing anyway). We connect each data source to a view in the database. In the live database the views can have a lot of rows, in the test database it is easy to put in an extra clause to limit the results to a manageable set. When going live, all you have to do is edit the connection on the server (which if the view names are the same is not a problem) to go from test to live. That way we can always work with smaller copies and/or specify the dataset limits to work with in case we need to investigate a specific issue.

                   

                  Hope that helps,

                   

                  Dana

                  • 6. Re: Re: Dealing with large amounts of data is becoming painful
                    Toby Erkson

                    Replace a data sourcehttp://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#connect_basic_replace.html

                    Performance tips for relational databaseshttp://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#performance_relational.html

                    The Hide All Unused Fields button can be seen here: http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#extracting_create.html

                    Manually un/hiding fieldshttp://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#datafields_dwfeatures_hide.html

                     

                    Next would be to analyse the data pull itself if you're using Custom SQL.  It could be a bad SQL statement if you're not experienced (we don't know your skill level).

                     

                    I agree with the others that using a subset of data makes sense.

                     

                    Also, what is your pain?  You haven't explained exactly what you're having issues with.  Download time of the extract?  The report updating everytime you change a field is getting annoying?  Why does the report have to change every week?

                    • 7. Re: Dealing with large amounts of data is becoming painful
                      Matt Lutton

                      I am hopeful Tableau will focus on big data as they move forward.  The majority of tutorials and training materials focus on small data sets, and I too find that working with large data sets is difficult at times.  One of my main issues is the inability to change the primary table in a data source or data blend.  The "replace data source" option is helpful, but not nearly as dynamic as I wish it was.  Stored procedure support and more SQL-like functionality would be helpful for large data as well, in my opinion.  A "group by" functionality would be nice for some scenarios too, rather than relying on complex Table Calculations (as an example, I would like the ability to find the MAX of a measure, by ID # and use that result in calculations, outside the current view, etc.)  It can (sort of) be done, but its beyond me at this point.

                       

                      Apologies for my rant--if performance is your issue, there are certainly good ways to deal with that, as suggested by other forum members here.

                      • 8. Re: Dealing with large amounts of data is becoming painful
                        Tony Loxton

                        Jim Wahl

                         

                        I am not sure what you mean. Do you mean I set up two different data connections one for local and one for production and switch back and forth?   I am using Tableau 7 and it is not able to separate a dataset (recordset) from it's connection.  I can't say "use this exact set of tables and joins but connect to this DB instead".

                         

                        Ideally I would like to have a config file someplace where I can specify which databases to connect to based on either host name or environment variables like my applications do.

                        • 9. Re: Dealing with large amounts of data is becoming painful
                          Tony Loxton

                          The view is faily well optimized. It's basically a star schema but in a view. I also do various minor things like rename fields, create fields based on other fields etc.  Not big deal.

                           

                          I use data extracts and the extracts are getting huge. If I want to make a modification to the report I have to pull the entire extract from the report server to my desktop. In my case the report server is in the cloud and I am roaming so it becomes very cumbesome to download 25 megs, change the report, and publish 25 megs back up the server. I have to do this even if I just want to make a minor change. The alternative is to install tableau desktop on the same machine as the server and use RDP but that's not very pleasurable either.

                           

                          Also in my case I am running tableau in a virtual machine (I am on a mac so I have to run windows in a virtual machine) so that's even slower than normal.  

                           

                           

                          Regarding hiding unhiding.  Can I get a list of fields which are unused instead of hiding the ones that are not used? I might want to delete a bunch of them.

                          • 10. Re: Dealing with large amounts of data is becoming painful
                            Jim Wahl
                            Do you mean I set up two different data connections one for local and one for production and switch back and forth?

                            If you wanted to try the performance of a local DB vs production DB, I thought you could just change the the server location / IP in the data connection dialog box (right-click on the connection in the top left Data box and select edit connection).