mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Questions about PearsonCorrelation on a example
Date Tue, 23 Jun 2009 23:17:32 GMT
To beat a very tired horse, I think that all squared error correlation
measures (Pearson's chi-squared, Pearson's correlation, squared deviation
and so on) are completely suspect for small count data.  Furthermore, any
reasonable sample of truly long-tail phenomena includes great numbers of
small counts.  Furtherfurthermore, long-tail phenomena are the rule rather
than the exception.

Thus, I almost never like these measures and would have a hard time arguing
that there is anything good about this kind of measure.  The only exception
would be in a pub where I would take any side of any debate for the
amusement of the crowd.

Try mutual information or multinomial likelihood ratios instead.

On Tue, Jun 23, 2009 at 3:48 PM, Sean Owen <> wrote:

> One could argue that this behavior is actually a good thing -- basing
> an estimate of similarity based on one data point could be very
> unreliable.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message