Not much progress, no. There are a few issues: 1- the complexity of the
calculation(s), 2 - the additional complexity of fiscal year (with a 4-5-4
retail calendar) that's not a 'date' variable and finally, 3-
I'm still struggling to understand if best practice is to pre-calculate
outside of Tableau or do the calculation within it.
Susan if your data is structured more columnar, instead of horizontally, this can be done with table calculations. However, the fiscal date makes it very difficult. If you had an example data set I could show you how to do this, but not accounting for fiscal year.
End of the line though, the number of table calcs needed will be a big hit to performance. If there is a way to do it in the back-end, I would encourage you pursue that route.
So Zach - let's say the end goal here is to get counts of customers in each segment - will this still be a big performance hit? And should this still be done with table calcs? Each customer's classification isn't important, just at the aggregate level.
Ideally customer segments would be a stacked bar by year, i.e. in 2014 we had 35% New, 35% Lapsed, 10% Reactivated, 20% Retained customers; in 2013...etc etc.
I don't have a data set too similar to this. If you could provide a sample, I could try to mock up an example.
Zach - Sorry, I meant to add that my data is structured very similarly to
the Superstore data in that it's down to the order level of detail (that's
why I'd used this as example data in the first post). Here is some
attached dummy data - one row for each transaction, with several
dimensions (the real data has many more). Customer segmentation is to be
based off of Brand ID. This only has about 2600 lines, real table is much
Cust Seg Ex Data.xlsx 115.5 KB
...I'm a bit late to the party! but the problem of customer segmentation within Tableau is one I've wrestled with for a while. The problem being that you need to keep the customer level in the view to allow the aggregated calculations to run. I'm a bit short of time now, but Bethany Lyons Let's Talk About Set's Baby Webinar helped me out hugely. It's an hour long, but I must have got that hour back 100 times over!!
Look specifically at the Cohort Analysis bit, the formula structured [MAX( IIF([Category] = [Category Parameter], 1, 0 ) ) = 0] and combining of sets.
Hope this helps, it really helped me.
Thank you so very much for your help with this, I really appreciate it.
The pass-through sequel might be the way to go, I will try this and see
what happens with performance. The data source is HP Vertica tables and
they are very very large. Perhaps one thing that would help is to limit
the look-back period for segmentation. Another solution might be a
back-end append, like you suggest with Excel. I agree with Simon that the
challenge is to have the customer level detail in the view, especially
when we are dealing with millions of data points.
Thanks again for your help. These forums are always so helpful-
Thank you for the response, I will definitely watch this video. It's
really timely because I had another question about sets, hoping this will
give some insight. Basically trying to use a flag to define a set of
customers without filtering for other dimensions. Odd thing is it works
great in Superstore data otherwise I would have posted to the forum.
I see from your bio that you have experience in retail analytics... very
interesting. Just out of curiosity what have you found that Tableau
provides that other BI tools do not?
Yes the use of Sets is very handy, although I have had a quick play with your problem using sets...and sets doesn't seem the be the answer here! You can do it but you'll need 3 sets for every year!...so I'd stick with kettan solution. To do this in Tableau directly it might be worth voting up the following idea http://community.tableau.com/ideas/2177 (don't want to turn Tableau into Excel! but this would be very useful)
btw you can still keep Customer Level in the detail, and produce good Vizes, when running aggregated calculations, but there are limitations. I've attached an example of a question I was asked a while back. I have used the SUM[Invoice Amount]) to create the Groups, and by bringing Customer Name (in the attached example from SuperStore Sales) into the detail shelf the calculation will perform at this level. If you click in the Bar Charts, you'll see that they are actually made up from stacking each customer. This is fine for Counts and Sums, but breaks down if you want to visualize a ratio calculation (say Avg Basket Spend) as with the requirement to have the Customer in the ID shelf, as any Visual will stack the individual Av Basket Spend, and so not show you the average of that group. You can get round this with Table Calcs, but it gets a quite complicated/convoluted!
My current way of handling this, is to use Tableau to create the segments as in the attached, use the Export data to create a table of customerID, Customer Group, and then join this back in (for me Via SQL) to use as a proper dimension. I could create the Bin/Customer Segment rule in the database itself, but I like to use Tableau to visualize where the best 'segment breaks' should be (I'm thinking RFM Customer segment models here).
Below is an image (unfortunately the data it too sensitive (and around 20M rows!) to share the workbook), but you can see that I use Parameters to change the 'breaks' between groups, so that I get 'sensible' proportions in each of my final RFM groups (...I split the Recency into 3 sections, where the bottom and top [and thus the middle group] are controlled by the parameters, and do the same for Frequency and Value, there is then a final RFM segment calculation that puts people into their final group, say IF Recency = High AND Frequency = High AND Spend = High then 'Gold Customer'...etc.) . As I change the parameters then numbers in each group changes, so is a really nice way to ensure sensible group sizes...Once happy I then, usually, get our amazing DevOps team to hard code this into the database (appreciate this is a bit of a luxury!) to use as proper dimensions. The Viz below is, again, individuals stacked, so the aggregate calculations work.
...Now for the actual question you asked!! I've not used a huge number of BI systems prior to Tableau (mainly a relatively unheard of one called 'Omniscope'), and then the ones that Statistics software provide (and my old friend Excel!).The main thing for me with Tableau is the flexibility and speed (speed of making Visualisations), this means I can play/investigate the data in lots of different ways before finding the optimum solution. The connections to many datasources (although the lack of direct connection to Google Sheets seems an oversight!), the speedy handling of large(ish) datasets (seems to be fine with upto 70-80 Million rows), and integration with R are also big draws for me.
By flexibility I find it offers the best of both worlds, you can (using the Show Me icons) use it like a traditional BI tool, where you choose the 'Chart Type'. But if you look what it's doing, it's actually creating the Viz from the components of Marks, Size, Colour...etc. (if you choose a Tree/Tile Chart, and then change the Marks from Squares to Text you'll see what I mean). Once I'd got that you can start creating some really powerful Vizes (Even create your own)...for example on a line chart, try dragging a measure on the size tile, and the thickness of the line reflects the measure...I found that I can get an extra dimension on view which is still intuitive to view. Or for some real 'free form'/'info-graphics' check this one out!
So you can go from traditional Vizes, Stephen Few would be proud of (hopefully!), but also full info-graphicy type
There are quite a few comparisons on the Web, and in the Community (this one has some good links, to some in depth discussions)
Of course nothings perfect, so a lack of a Front End ETL (would be so good to have Altryx front end), and although Table Calcs give you a lot of flexibility, it is sometimes necessary to 're-shape' data outside Tableau (which can be a problem if you don't have decent SQL skills...or access to them), to make life easier in the long run (as you are having to do here) are slight negatives.
Hope this all helps
Customer Bins.twbx 584.8 KB
1. Omniscope rocks!
2. For Google spreadsheet try this driver Google Spreadsheet ODBC Driver - Overview
3. "lack of a Front End ETL" check this out third parties tools able to export/import data to/from Tableau Data Engine
I think you should be summarising the data in the database as much as possible before throwing it into Tableau, especially if you need it at such a summarised level. Let the database do the crunching, and let Tableau display the end-result. I've never used Vertica either, but hopefully you can create a view or another table to summarise the data in the way you need it and then let Tableau query that object.
While Vertica was built to do this kind of thing, it might not be too bad in Tableau :).
A table calc approach isn't all that difficult. Table calcs are done inside Tableau, which makes for a big internal table, but all you need is the customer ID level of detail for each time period.
Here's the path I started down:
Customer Status =
CASE PREVIOUS_VALUE("") WHEN "New" THEN IF ISNULL(SUM(Sales)) THEN "Lapsed" ELSE "Retained" END WHEN "Retained" THEN IF ISNULL(SUM(Sales)) THEN "Lapsed" ELSE "Retained" END WHEN "Reactivated" THEN IF ISNULL(SUM(Sales)) THEN "Lapsed" ELSE "Retained" END WHEN "Lapsed" THEN IF ISNULL(SUM(Sales)) THEN "" ELSE "Reactivated" END WHEN "" THEN IF ISNULL(SUM(Sales)) THEN "" ELSE "New" END END
PREVIOUS_VALUE returns the customer status for the previous year, when addressing is set to Order Date / Year. For the first year / row in the partition, "" is returned.
In a table view this looks like this:
Now you can sort of turn this into a graph using another calculated field to sum the customer IDs (we already have distinct customer IDs, because the view is at the customer ID level of detail).
Count of Customers =
// Normally you could just use COUNT([Customer ID]) or SUM[Number of Records] // but these won't count null values, which is required for the "Lapsed" count. // The workaround is to use the table calc function LOOKUP(1, 0) to trigger a value on // every cell including the non-existent Lapsed cells. If the cell exists LOOKUP(1,0) // will return 1. If it does not exist, it will return NULL and IFNULL returns 1. // These values are then summed. WINDOW_SUM(IFNULL(LOOKUP(1, 0), 1))
The problem is that the bar chart below is really a stacked bar with one sliver for each customer ID. Tooltips and clicking on the bars are annoying, since you see just that sliver, but you can disable tooltips and use reference lines to put a the total value on each bar.
If you're thinking that WINDOW_SUM() should fix this if the addressing / compute using is set to Customer ID, you'd be right except that Tableau doesn't let you partition a table calc (Count of Customer) using a field that is also a table calc (Customer Status).
While it might be cleaner and faster to take the WINDOW_SUM() out, I've left it in, because I believe there is a workaround that requires some voodoo zen.
In addition to the clicking / tooltip issues, this means that you can't create a stacked bar chart or show the percentage of total.
Again, I agree with the advice to do as much as you can in the database. But sometimes you want to play with something before digging into the SQL. And I get a kick out of finding a use for PREVIOUS_VALUE().
Customer Bins.twbx 692.1 KB
Thank you, Jim. I will try this - totally makes sense to test it before creating additional tables/bank-end work.
I've used the stacked bar view with individual records in the past... The only issue I have with this is that if you want to create a true stack by year (i.e. new, retained, react on top of one another then color by segment) there isn't a way to label the segments or % of total. But I appreciate that this gets almost all of the way there without SQL - hugely helpful.