7 Replies Latest reply on Jul 13, 2016 6:01 AM by Jay Parikh

# Percentile of a dataset having a lot of similar values: RFM Analysis

I'm trying to do RFM analysis and I am confused how to divide the percentile for frequency.

I have a dataset of about 50,000 rows.

For frequency, i have:

54% of those are 1 orders

20% of those are 2 orders

and the rest is divided between values 3 through 54 orders.

By dividing it as it is in group of 20%, customers with 1 order are getting 3 different weights.

Should I just put all the 54% of 1 time buyers into the first percentile i.e. give the weight of 1? Would this be a fair way to do it?

Or should I put 1's and 2's into first percentile i.e. give weight of 1?

• ###### 2. Re: Percentile of a dataset having a lot of similar values: RFM Analysis

hi Jay,

So I think what you are saying here (and correct me if I'm wrong)...that in the Frequency part of your RFM model you are unable to split the customer down to the number you'd want in each percentile, as there are very few groups (ignoring the few customers who have purchased more than 2 times...who get put into 3 further groups)?

Without knowing which business you are in (so don't know the %age of customers ideally in each group)...this is absolutely fine. As it's only when you add the Recency (R) and Spend/Monetary-value (M) that customer fit in their final segment.

So for example (make things nice and easy and say we have 1000 customers)...you might well have, say, 40% (400 customers) of customers fitting in to your final group on Frequency (F), of say 5+ purchases. However you might only have 10% of these in your most Recency (R) group, so 40 customers (or 4%)...then of these 40 you might only have 50% in your most valuable/high spend (M) classification (so 2% or 20 customers). So in your top group of, lets call them 'Gold', customers you only have 2% (or 20) despite having over 400 in the Top R group.

Hopefully I've understood your question, and the above helps, but if not let me know (ideally with some dummy data) and I can take another look.

• ###### 3. Re: Percentile of a dataset having a lot of similar values: RFM Analysis

Simon Runc

Hello

Thank you for the reply. To be more clear:

I'm doing RFM analysis on my dataset. I'm using a 1-5 scale(20%)

Someone who bought the product once and is high up in the list is getting F=1 since he belongs in the 1st percentile(20%). Another customer who buys the product once and is lower in the list is getting F=3 since he belongs in the 3rd percentile(60%). This happens because there so many 1's that it stretches up to 60%.

It really isn't fair to score them this way, irrespective of R and M value.

You might be able to understand it better in the image.

One solution to this which I came up with was assigning F=1 to all the customers who bought the product just once. That way, orders 2-6 can spread into F=2,3,4,5.

• ###### 5. Re: Percentile of a dataset having a lot of similar values: RFM Analysis

Hi Jay,

The method of assigning RFM segment numbers

is to divide the population into equal-size chunks,

hence equal percentiles. It could be 5 or 4 (or less)

of them along each axis -- you decide.

If 54% of your population bought only once,

then all of them will get Frequency quintile of 3.

Others would go to 4 and 5. This is expected.

If this result is of little use for your analysis,

you may want to shift a F=1 bound unequally (to 54% of total)

Then assign the rest their own scale and %ile numbers.

Yours,

Yuri