2 Replies Latest reply on Feb 15, 2012 6:28 AM by Wendy Foslien

    What causes huge temp tables to be created?

    Wendy Foslien

      We've recently made a change to our data connection for a workbook, and some tables that were previously set up as a multiple table connection are now hidden from Tableau through a view that does the join. So, the data connection now connects to the view as a single table connection. We're doing this because we are doing calculations in the join that we'd like to maintain centrally, and trying to do those calculations in Tableau would require some very complex connections through data blends that I'm not sure would be possible.


      The view has the potential to be extremely large, on the order of 10s of millions of rows, if it is not limited by a where clause. We do this limiting through actions, so that contents from this data connection should always be limited to just a few thousand rows. I say *should*, because it seems that something else is happening.


      Here's the issue that we've noted. It appears that when we attempt to use a filter on the data that is in this connection, a really huge temp table gets created. Note the the filter isn't a context filter -- in that case I would expect a temporary table. We see table names along the lines of "

      #Tableau_1 or 2_Connect or _Filter" when we examine the contents of the temp db (this is on SQL Server). There are global actions defined in this workbook -- could they be the culprits? I didn't think that an action created a temporary table, so I'm puzzled as to where these temp tables are being generated.


      I'd like to understand the cases where these tempdb tables get created. It is bringing our performance to a complete halt, filling up the disk on the SQL Server. With all of the care we've taken to limit the queries, it seems that something is joyfully creating a large temporary data set, and I can't see what it is that is doing this. Any ideas would be greatly appreciated.


      Thanks, Wendy

        • 1. Re: What causes huge temp tables to be created?
          Robert Morton

          Hi Wendy,


          First, you can learn more about why Tableau generates the kinds of queries it does (including those involving temp tables) from a talk presented at our 2011 Customer Conference. View the recorded talk or the slides for "Tuning Tableau and Your Database for Great Performance", in the Advanced section here:  http://www.tableausoftware.com/tcc11-sessions#advanced


          Complex filters can lead Tableau to stage queries in a temporary table for functionality or performance. In some cases, the query expression in the WHERE clause simply becomes too large for the database query processor to handle. In other cases, a large WHERE clause simply leads to poor performance. Tableau will instead join against a temporary table containing the values to be filtered. As described in the talk I linked, you should examine your action filters to determine if they are expressing a collection of discrete values instead of a more succinct range of values. While temporary tables may help improve performance for filtering discrete items, they introduce overhead for populating the tables, they may not have good (if any) indexes, and they tend to be poorly supported by query optimizers of MPP databases. You may find immediate improvements by looking for opportunities to use range filters if possible.


          Does this help?


          • 2. Re: What causes huge temp tables to be created?
            Wendy Foslien

            Hi Robert -- yes this helps, thanks very much for the link, great information there. One question I have -- is the workbook that Matt showed in the video available? I wasn't sure exactly how the embedded list was added, I think that might be helpful to use in cases where we know the list of aggregations that we might encounter ahead of time.


            In our case, we are working with a very large pool of time series data that is currently accessed via a SQL Server view in the data connection as I described in the first post. Our action filtering is around identifiers for points that we want to view, and then our selection of the time series data is also limited by a range of dates selected through a quick filter (relative date), and then we add quick filters for other attributes, like the day of week and time of day. So, we have a mix of ranges in a quick filter (the actual dates) and discrete values (point IDs) in actions, and discrete values in additional quick filters. It seems to be when we add the additional quick filters, that is when we start getting the temp database growth, and that is consistent with conditions you describe above in generating a complex WHERE clause. So, I think I understand what drives the temp table creation now -- thank you.