Did you consider metrics like:
Sales per dealer, profit per dealer
profit over sales ratio for entire groups
and similar metrics where you normalize before you compare.
Thanks Luisa. That seems to help in normalizing the data.
Thanks Nitish. I did not think of that. That is a good suggestion though.
With a sample size that large you have more than enough statistical power to use a non-parametric test like the Mann-Whitney U test for stochastic dominance in a single ranking dimension, like say sales. Odds are good though that there will be a statistically significant difference, simply due to the large sample size. The harder question is whether or not it is a meaningful difference, which is driven by two considerations: First, why is there a difference? Answering that depends on whether you have included the correct explanatory latent variables in your data. Second, is the difference large enough to have an operational impact. With a sample size that large, even a less powerful non-parametric test will be very sensitive to small differences in the distributions.
The really powerful question is to ask: accounting for all the factors that the dealers cannot control, what is the difference in sales? In the life sciences this is referred to as risk adjustment, although propensity scoring might work as well. Answering this question will not only tell you which dealers are adding the most value due to their behaviour, but also which dealers have large unrealised potential for generating sales, if placed in a better market.
For any financial data I would advise against any tests, like the t-test, which rely on assumptions of normality. Most financial processes tend to originate from geometric stochastic processes, which results in variables whose distributions have power-law tails.