You can always limit an extract to the top N rows. That can limit the amount of data for test and design purposes. Or, as you mentioned, you might add a filter when extracting to get a limited range of data. Hiding all the unused fields can also dramatically reduce the size of extracts.
Other options might include having a test version of the view which limits the data set. Ultimately, if your data is sufficiently large, you might want to consider how it is structured. You can get great performance using a live connection with properly indexed, partitioned, and referenced star-schema structures where you allow Tableau to do the joins (and thus utilize the metadata to optimize the SQL).
Is it possible to change the extract rules for "design" and "production" environments? Obviously I want to use the full data for reporting.
Also how can I get a list of fields which are not being used in any of the charts in the workbook?
I tried mocking creating a new report with live connection and it's cumbersome to try and design reports while connected to the live database.
Is there a way to connect to a copy of the database on my machine for designing but have the database connect to production database when it's published?
I suppose DNS trickery might work but hopefully there is a more elegant solution.
What about editing the data connection (right-click > edit connection) to point to your local DB / production DB.
You've probably tried this, but I'm curious if it doesn't work.
We have a Live site for users and a Test site for development where copies of the workbooks are the same (easier for proper testing anyway). We connect each data source to a view in the database. In the live database the views can have a lot of rows, in the test database it is easy to put in an extra clause to limit the results to a manageable set. When going live, all you have to do is edit the connection on the server (which if the view names are the same is not a problem) to go from test to live. That way we can always work with smaller copies and/or specify the dataset limits to work with in case we need to investigate a specific issue.
Hope that helps,
Performance tips for relational databases: http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#performance_relational.html
The Hide All Unused Fields button can be seen here: http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#extracting_create.html
Manually un/hiding fields: http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#datafields_dwfeatures_hide.html
Next would be to analyse the data pull itself if you're using Custom SQL. It could be a bad SQL statement if you're not experienced (we don't know your skill level).
I agree with the others that using a subset of data makes sense.
Also, what is your pain? You haven't explained exactly what you're having issues with. Download time of the extract? The report updating everytime you change a field is getting annoying? Why does the report have to change every week?
I am hopeful Tableau will focus on big data as they move forward. The majority of tutorials and training materials focus on small data sets, and I too find that working with large data sets is difficult at times. One of my main issues is the inability to change the primary table in a data source or data blend. The "replace data source" option is helpful, but not nearly as dynamic as I wish it was. Stored procedure support and more SQL-like functionality would be helpful for large data as well, in my opinion. A "group by" functionality would be nice for some scenarios too, rather than relying on complex Table Calculations (as an example, I would like the ability to find the MAX of a measure, by ID # and use that result in calculations, outside the current view, etc.) It can (sort of) be done, but its beyond me at this point.
Apologies for my rant--if performance is your issue, there are certainly good ways to deal with that, as suggested by other forum members here.
I am not sure what you mean. Do you mean I set up two different data connections one for local and one for production and switch back and forth? I am using Tableau 7 and it is not able to separate a dataset (recordset) from it's connection. I can't say "use this exact set of tables and joins but connect to this DB instead".
Ideally I would like to have a config file someplace where I can specify which databases to connect to based on either host name or environment variables like my applications do.
The view is faily well optimized. It's basically a star schema but in a view. I also do various minor things like rename fields, create fields based on other fields etc. Not big deal.
I use data extracts and the extracts are getting huge. If I want to make a modification to the report I have to pull the entire extract from the report server to my desktop. In my case the report server is in the cloud and I am roaming so it becomes very cumbesome to download 25 megs, change the report, and publish 25 megs back up the server. I have to do this even if I just want to make a minor change. The alternative is to install tableau desktop on the same machine as the server and use RDP but that's not very pleasurable either.
Also in my case I am running tableau in a virtual machine (I am on a mac so I have to run windows in a virtual machine) so that's even slower than normal.
Regarding hiding unhiding. Can I get a list of fields which are unused instead of hiding the ones that are not used? I might want to delete a bunch of them.
Do you mean I set up two different data connections one for local and one for production and switch back and forth?
If you wanted to try the performance of a local DB vs production DB, I thought you could just change the the server location / IP in the data connection dialog box (right-click on the connection in the top left Data box and select edit connection).