1 of 1 people found this helpful
First of all
Second: A normalised Datasource would work well, however it could take some times to load. I would prefer your first option here.
Would CrossTab data work
2 of 2 people found this helpful
I'm thinking "tall" w/15B records, here's why:
If the properties are created as columns then calculations to make subsets of properties are going to be a pain to create e.g. [Property1] + [Property2] + [Property3]... and if there can be Nulls that means you'll have to wrap every single column in IFNULL([Property1],0) and so on. This will be slow to build, potentially impact performance more than having a pivoted "tall" source, and certainly be painful to maintain.
Another thing to keep in mind if you're building dashboards is that the "one data source to rule them all" approach often doesn't work for Tableau, for performance and speed of interactivity we'll often need to create aggregated views and/or extracts of the data. For example if there's an overview view that is the last 5 years at a monthly level per property then a data source with 5 years * 12 months * 100 properties = 6,000 rows that will be fast to load and work with, and then from there the user could use Filter Actions to apply to other detail views/dashboards that are accessing the larger data set but in a filtered way that keeps them fast.
FYI the best reference on improving Tableau performance is the Designing Efficient Workbooks white paper by Alan Eldridge: http://www.tableau.com/learn/whitepapers/designing-efficient-workbooks, be sure to check that out as well.
1 of 1 people found this helpful
May I add my 2c please?
"Organizing" would mean different things (all of them matters):
-- loading data from original sources in raw format (into a "staging area")
-- storing data effectively (typically it means sorting / compressing / cleaning)
-- making (derived) data structures useful for different kind of analyses.
The latter point is typically underestimated (even when doing the former two "right").
What kind of analyses are you expecting? Two main groups would be ML and OLAP.
The former -- applying Machine Learning algorithms --
is better served from a flat table with Items-Dates as Rows ("instances")
and all Attributes (and their derivates) as Columns ("features").
The latter (OLAP) is where Tableau is best for.
As Jonathan mentioned above (and rightfully so)
for the most OLAP (aggregate & filter) scenarios
the better shape looks like a so called "tall" dataset --
comprising of facts & dimensions, which in your case
would be a "periodic snapshot" (a daily one)
of Items' Attribute Names & Values (current as of date).
How wide / tall this datasource would depend on many things:
-- the data types of Attributes Values: are they just one data type
(INT, FLOAT, BOOLEAN, DATE/TIME, STRING) or more? If the latter,
then at least a column per data type is recommended for a "tall" DS
-- are filters on different Attributes goes mainly with AND clause? For example
( [Attribute1] = 'Value1' OR [Attribute1] = 'Value2') AND ( [Atribute2] < 0.5 )
If mostly yes, then a somewhat "wider" table --
with [Attribute1] and [Atribute2] as distinct columns --
would be more performant then her "taller" cousin.
... and it would go further to a point of reconsidering a "wide-flat-table" approach :-)
Besides, the most restricting factors typically would be these ones:
-- How easy is to transform the original raw data into the "right" shape?
-- Could these transformations be automated / scheduled / set & forgotten?
'Feeling that's enough for a 2c :-)
hello lenaic, that's also my current intention - although we are still deciding...
hi jonathan, your point is very vallid and the only reason why i am considering the tall version at all. Luckily the data is rather clean and the columns are properties (columns) are rather stable so i think we will risk the small version :-)
hi yuri, thanks for the great comments. The type is an olap version and the attributes are floats. Filters would usually go as and and the transformation of the data is easy as most is feeded via an olap layer that has been built especially for this project.