Table calculation performance when calculating an average of a rank
Peter Quackenbush May 13, 2017 3:22 PMHello,
I'm trying to construct a relatively complex table calculation that I can construct, but the performance is very poor.
Here's the actual challenge (but to have a shareable analogue, I'll provide an example with superstore data.).
I have a database with a bunch of auto insurance quotes. There's 1 record per quote_ID, vehicle_number, and quoting company. What I'd like is to first, get the SUM(Premium) for the data by quote_id and quoting company. Then, rank the quote_id's by the quoting company from cheapest to most expensive. Then, provide the average of the rank by things like postal code, number of drivers, or other dimensions. I'd like a plot of the average of the rank for each of the quoting companies, but I'd settle for just 1.
Overall the database has around 250,000 records with around 20 columns and around 10 different quoting companies. Policies can have 1 2 or 3 vehicles on it. For example, a 2 vehicle policy that 5 companies provide a quote for will have 10 records.
I've accomplished this using table calculations, but the performance is not ideal. It can take 12 minutes to process the table. That's far too long. I think the main problem is that all these table calculations result in a cartesian product of the table calculation's dimensions.
I've built a very similar analogue to what I want using the superstore sample data. Essentially, for each order I want calculate the RANK_PERCENTILE for the SUM of Sales for every product's SubCategory. In other words, group the data by Order ID and the product's SubCategory, and find the SUM of Sales. Then, use a WINDOW_AVG to find the average rank by state for one one particular product subcategory.
Phew.
Anyways, attached is the worksheet that I've put together as an example. The Demo tab shows the whole aggregation that I'm doing, and the Demo2 tab shows a line graph. The values of that graph are what I want. You'll notice that there's a bunch of marksthough as I've said they are all basically the same because doing this calculation on a table calculation seems to create an unnecessaryand slowtable calculation.
I'd appreciate if anyone has an idea on how to make this more performant.
Thanks!

Avg Of Rank.twbx 1.2 MB