Should be doable with table calcs - I might have a go later unless Joe gets in first.
"(data of analysis - 30 days)" and "over a 30 day period starting on each of the individual days" seem to be contradictory. Shouldn't that be "ending on each of the individual days"?
Sounds great, if you would.
Yes... a bit contradictory. It should be "ending on each of the individual days"!
For each individual day I want to count the unique number of users over the 30-day period ending on the day of analysis.
Looking forward to see and understand your solution.
Thinking about this a little more, I think the easiest way to do it would be with custom SQL. But that would only work against a proper database, as opposed to a text file or spreadsheet, since the Microsoft JET engine used for those doesn't support the "count distinct" operation. Normally when you want count distinct with a JET datasource you just create a data extract, since the Tableau data engine does support that - but unfortunately the data engine doesn't support custom SQL, so I can't see a way to achieve what you want with that.
I'll have more of a think about how to do it with table calculations - but let me know what your database is and whether a custom SQL approach would work for you. If so I'll elaborate on that approach.
There is no database up and running yet. I'm exploring the possibilities with Tableau against the database model we're currently designing. The final solution will probably run against a MySQL.
Either solution will be very helpful.
My guess is that a table calculations solution will be quite heavy to run with millions of users with a daily or weekly activity.
Good point about the volumes - I usually ask people that when discussing table calculation solutions.
As far as I can see you would need a row per user per session start date to be returned to Tableau to be able to do your MAU calculation in a table calculation. I generally reckon that the practical limit for table calculations is somewhere in the tens of thousands of rows or sometimes hundreds of thousands of rows. Beyond that things either get too slow or you simply run out of memory.
If you have millions of users with daily or weekly activity you would be needing to process tens of millions of rows. You won't do that with table calculations, I don't think.
I would definitely look at the SQL approach - either a custom SQL connection or perhaps a RAWSQL calculation will be able to do the calculation. With those sorts of data volumes it's worth prototyping to see how fast it's going to be and probably worth experimenting with some different approaches to the SQL - plus making sure you have the necessary indexes to support it efficiently.
I'll experiment later and post some more detail about what the SQL would need to look like - I'm pretty sure it will work, less sure how fast I can make it.
I had a quick play with this using the RAWSQL approach which I think is likely to be the best. I created a calculated field using RAWSQL to calculate MAU and tried that out on a 20 million row database I happened to have lying around. The query took 750 seconds to run.
The database happened to be an Infobright database (which is a column store database which runs under MySQL). Thinking about it briefly, I don't think the column store format is very helpful for this particular query - you would very probably be better off with the regular MyISAM engine (the default MySQL engine - especially if you define an appropriate index. I didn't bother to experiment, but I'll give you my first guess at a useful index below.
Assuming you have fields: [User ID] and [Date of Start] (where [Date of Start] is a date, not a datetime), you just need to create a calculated field like this for MAU:
SELECT COUNT(DISTINCT `t2`.`User ID`) AS `ctd_usr_id_last_30`
FROM `your_table` AS `t2`
WHERE `t2`.`Date of Start` BETWEEN (%1 - 29) AND %1
[Date of Start]
1) BETWEEN is inclusive of the end dates, so (%1 - 29) to %1 covers a range of 30 days. Note that you could feed the range in from a Tableau parameter if you wanted.
2) The table and field name format shown (enclosed in backticks (`)) works with MySQL. Other databases need a different format. I always just look in the Tableau logs to see what Tableau is doing and copy that.
3) My first guess for an index to make that work quickly for a regular database would be a composite index on [Date of Start] and then [User ID] - but different databases may behave differently, so you'd definitely want to experiment.
Out of interest I copied that table from Infobright to the MySQL MyISAM engine and tried that, using that index I suggested. That came down to just under five minutes (rather than twelve and a half minutes with Infobright). Your mileage will vary, depending hugely on real data distribution - but it just gives you an idea of the order of magnitude you can expect.
Depending how often you need to run it, how responsive and interactive you want the analysis to be and how you're data is distributed you might want to look at pre-aggregating the data as you load it.
Thanks a million for all your thinking and testing.
Going SQL seems like the only way to go, and as you mentioned in your last post, pre-aggregated.
Further analysis of the DAU/MAU (such as breakdowns on user demo, including age, country, registration date, behavior profile) would probably too slow.
But pre-aggregated with user demo would make this complicated, or is there a smart way to do this?
I'm also performing similar calculation. And I would like to know what is t2 in this case?