One time dump of data without process vs quality data over time with a continuous, preferably automated, process that provides a very specific use-case of data vs one that provides a broader, generic, reuseable one. Having to recreate the data models over and over again because of different granularities or slightly different sets of data, vs recreating our warehouse dimensions as Tableau data sources and joining them in Tableau rather than pre-compiled tables. So many issues we're running into. I feel like most of the more visible work being showcased in the Tableau community is based on one-time data prep of a large data set, one product(dashboard) delivered, and then forgotten. The real data prep we need is building the backend that supports continuously updated data sources in Tableau that can be generic enough to be used for widely differing usage. That's the kind of data prep I'd like to see.
I offer Tableau trainings myself and spend a lot of time teaching about this subject, and an awful lot of my blog posts are about refining my own languaging around how to describe what Tableau is doing and the how and why we need to set things up to get the results that we want. There are huge differences in the incoming skill level that drastically change what is required for training. The skill areas I've identified are:
- casual spreadsheet users who barely know how to write a formula and mostly build things with a lot of copy & paste and/or manual typing
- spreadsheet users who know how to write simple formulas
- spreadsheet power users who are confident using functions like VLOOKUP(), OFFSET(), IF(), etc.
- casual SQL users who know how to write basic queries
- SQL users who are confident in using joins, subqueries & aggregate queries (people who have a background of building reports from other software like SSRS are often in this group).
- SQL power users/DBAs who know about database design, normalization, etc.
- statistical software package users who have some skills in manipulating data
- hard core stats users (often coming out of R or Python) who have more skills in manipulating data who are used to working with vectors/sets
I can tell a SQL power user or a hard core stats user "Tableau likes a flat table" and they get the concept without having to go so much into details, and even if they don't know how to do much in Tableau they have a skill set that can generate a flat table. So the training for those kinds of users is more about how Tableau relates to the concepts they already know and identifying Tableau features that that let them get to where they want to go faster/easier than they could by writing SQL or statistical code. Some examples include talking about joins and join culling, Tableau's pivot feature, teaching Sets & LOD expressions as a way to create cohorts, using join calculations to avoid some data prep, table calcs for more advanced aggregations, etc.
Whereas with the casual spreadsheet users we need to talk about data types, data type conversion, the difference between records (rows) and attributes (columns), what defines an attribute (getting to Hadley Wickham's notion of Tidy Data), case (in)sensitivity, how joins work, basic formula writing including working with dates and string manipulation, how to build & validate calculations in Tableau, and lots more just to get them going. One example I use in my trainings is a data blend on two data sets where a name is defined as Alice Toklas and in another Alice B. Toklas. People with a SQL background immediately get that these are different names, people who haven't worked much with data need the middle initial pointed out as the reason why the data blend fails.