I assume you've already extracted the data.
Firstly, data source filters are the most obvious. If there's any way to limit the scope - start there.
Secondly, context filters. If there are any high-level categorical filters that could initially (and dynamically) limit the scope of your data, add those dimensions filters to context. (right click, add to context)
There are a few performance tuning options you could consider. "Aggregate data for visible dimensions" will restrict the size of the extract once you have completed your project.
If you are using calculations you can tune those calculations.
As much as I start with filtering everything else beforehand, but I need to make most of the searches on the entire dataset, then look at the rest of the visualizations.
I only have 4 calculated fields and there's nothing complex in there either that would slow it down.
I will explore the tuning option and see where it gets me.
But, in between linking tables in Tableau vs importing one giant table already linked - any thoughts on this ?
If you're using extracts, which is best practice for large datasets such as yours, it won't make a difference from a performance perspective.
At TC16 Tableau announced the Hyper data engine which will increase query performance using extracts as much as 10x.
Are you running against an extract or live connection?
Live connection on 50M cells with the DB on a LAN is suicide
Do you mean extract as in TDE or extract as in CSV ? You talk about exporting to CSV, but not really that you're using a TDE.
I don't get the 50M rows - are you joining the data properly, or not joining and getting a cartesian product (a multiple of every record in TableA joined to every record in TableB) ?
So I'd definitely say that joining your data in Access before extracting would be a good idea. A better idea would be to use something better than Access (SQL Server Express maybe ? because it's free) and create a view to join the tables within the database before extracting. When you extract, don't bother with CSV, create a TDE (Tableau Data Extract) and there will be speed improvements immediately. Although I think your biggest win is to reduce the row count because it sounds too high compared to how you describe the base tables.
Yes, I meant the TDE extract. (I used .csv to calculate the nbr of cells). The 50M would be from the 550K rows x 90 columns (from all linked tables - I guess that's how Tableau combines it).
I'll try using an Access query to join first. Unfortunately I cannot install SQL server express as IT doesn't allow it so Access is my only option.
The row count is fixed (550K), only the columns vary based on the joins.
In either case, when the data gets loaded into the extract, it's consuming a flattened resultset and loading into a columnar database (the extract). If you flatten it in Access and record those results to a table then load the extract from the flattened table, the loading itself may be faster, but the interaction with the workbook/extract will be the same.
Columns makes the datasource wider (ie more dimensions and measures), but shouldn't make it longer (ie increasing number of records), but it all depends on the join.
Tableau will definitely count records as rows, not as columns.
You will see records manifested as rows on the front end, but when you create an extract Tableau actually splits each column up and writes to disk separately (the extract is a columnar database). It's [generally speaking] much easier to get performance mileage by reducing the number of columns than by reducing the number of rows. Similarly, high-cardinality columns are expensive. Columns with limited number of distinct values are preferable to, for example, and id column.
The high number of records are a function of the joins, sounds like.
In an extract this isn't really the case, as only the needed columns are unpackaged into RAM. If you're using 1 or 2 columns from a 10,000 column extract, while taking very long to generate, it should have the same performance at run time as a 10 column extract. In any case, 550k rows is nothing in terms of data volume, so then I start to get into the discussion of calculation creation and making sure as much as possible can be materialized, as well as adhering to dashboard and view best-practices.
Beyond trying to optimize the extract to materialize as many fields as possible, can you hide as many unused fields as possible (most people wouldn't use anywhere close to 90, though you may be an outlier) and aggregate for visible dimensions? This might reduce the row count.