1 of 1 people found this helpful
But one thing I've noticed is that when you pull in a data source, Tableau reads ALL the columns - every time. Even though you can "hide unused fields" and then extract the data, the columns are actually still available to Tableau, because you can still unhide them in the data connection window and refresh the extract.
I'm not sure this is true. If I hide fields, then create an extract, the SQL being generated against the database doesn't include fetching the hidden columns and the show all unused columns doesn't appear as an option. If I switch back to live the option comes back.
My question is, does this hinder performance vs. being able to only SEND specific columns to Tableau, so Tableau actually never touches the columns you don't need.
I personally don't find it does, row counts are what increase my extract times. This will vary depending on how wide your tables are though. Some of the examples I've seen on here with hundreds of columns would certainly be more efficient if they were only using 3 of those columns to turn off the 120 unused ones.
It would be easy enough to prove this theory with some custom SQL and comparing extract / load times.
Additionally, if the columns aren't being used then I would call for more specific / optimized views in the data warehouse to facilitate more efficient reporting.
I guess my question to all of you is, assuming all I have is a SQL database, but I don't know how to write any SQL so joins, etc. all need to be done within Tableau - is Tableau actually sufficient, or do I still need some additional ETL tools (including SQL statements), and why?
The answer to this 100% depends. Is your SQL database well formed / well structured to provide the best experience for Tableau? Personally I transform survey data in a SQL Server database into star schemas to expose to Tableau and it's a positively delightful experience. I put placeholders in spots where users expect blanks and the way I've setup my star schemas means they know exactly what I field is / does based on it's name. They don't need to do tricks or anything in the front end.
Not everyone has that luxury though so if I assume your situation is a little more vanilla then I would say that I think Tableau alone is sufficient to a certain extent. It really depends on what you're doing and how advanced it is. I tried to solve a question on RANKING in here earlier today and it would have required the data be ranked before coming into Tableau so the user could then apply a rank on top of that aggregation. Not possible through Tableau alone, but a piece of cake with some SQL first followed by Tableau second.
I was also surprised to find that when I performed a data extract and selected "hide unused fields", it hides them in the dimensions/measures pane, but if you go to edit data source, you can still unhide them. Having said that, Tableau could be disregarding those fields in subsequent queries, thereby improving performance.
I'm surprised that row count has more impact than columns. If I have a table with 10 columns and 100 rows, and I add a row, that's only 10 new data points. But if I add a column, it's 100 new data points. I'll have to play around and test my theory of exponential impact.
I can see how the capabilities of Tableau would vary depending on what you're trying to accomplish. I just had a user call and ask why their dashboard, connected live to 3 joined DB tables, was taking 20 seconds per dashboard to load. I need to dig deeper but it appears we already have use cases where direct connection is not the best way to go.
Thanks for your input! I'm hesitant to mark the question as answered, as I'd love to get more input. I'll leave it open for another day or so.
Don't mark this as answered yet, it's an open discussion. My answer is far from definitive.
Regarding the row versus column count, I wouldn't look at it as adding 1 row versus adding one column. I'm more talking from experience where people have data at a row level (dis-aggregated) in Tableau and they have millions of rows but the reality is they only ever look at it at an aggregated level. In that context, one column doesn't make a lot of difference as you're still fetching the same amount of rows.
Ahh...gotcha. Great point! I think that's precisely the issue my user is running into. Just way too much data.