Thanks it helps, so if i export the data into excel and check each column for duplicates and if i don't see any duplicates, that means the unique column is my lowest level of granularity right? in this example for superstore is it product name?
and any help in answering my second question?
2. what are the first few things would someone do as a best practice , if someone handovers a data set and ask you to analyze the data and give insight analysis?
Yeah, I wouldn't necessarily export to excel for duplicates. Here's a good thread on identifying duplicates in your data source:
I would say that the lowest level of granularity in the superstore data set is transaction level. Each row represents an individual transaction. Product is the lowest level of the product hierarchy which is Category > Sub Category > Manufacturer > Product
As for your second question, it depends on the type of data you're looking at and the audience you are presenting your analysis to. What is the subject matter? For sales data like the superstore, I would look at top selling product, top customers, sales by region, year over year comparisons and try to identify any trends such as where are sales growing and falling etc.
The data discovery process is more of an art than a science. I couldn't give you a step by step guide as where to start. Luckily tableau makes it simple to look at your data in multiple dimensions very quickly so it's typically pretty easy to identify some interesting features about your data. Try adding some of your dimensions and measures to the canvas and using the 'Show Me' feature to sample some of the different chart types. This might provoke some ideas. Hope this helps!
when i check, the transaction id which is 'order id' has multiple rows, it is getting split at quantity level. so i believe combination of 'order id + quantity' gives us a unique row in superstore data. Well yeah you are right data discovery is an art. sure i will play with different measures with different combinations of dimensions. I wonder charts like scatter plot, box plot and histogram, shows how the data is distributed so we can quickly identify outliers. this is just my opinion.