mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Nithian <anith...@gmail.com>
Subject Re: Question about Pearson Correlation in non-Taste mode
Date Sat, 30 Nov 2013 18:18:20 GMT
Hi Ted,

Thanks that is what I would have thought too but I don't think that the
Pearson Similarity (in Hadoop mode) does this:

in
org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.PearsonCorrelationSimilarity
around line 31

double average = vector.norm(1) / vector.getNumNonZeroElements();
Which looks like it's taking the sum and dividing by the number of defined
elements. Which would make my [5 - 4] average be 4.5.

Thanks again
Amit

On Fri, Nov 29, 2013 at 10:34 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian <anithian@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Thanks for your response. I thought that the mean of a sparse vector is
> > simply the mean of the "defined" elements? Why would the vectors become
> > dense unless you're meaning that all the undefined elements (0?) now will
> > be (0-m_x)?
> >
>
> Yes.  Just so.  All those zero elements become non-zero and the vector is
> thus non-dense.
>
>
> >
> > Looking at the following example:
> > X = [5 - 4] and Y= [4 5 2].
> >
> > is m_x 4.5 or 3?
>
>
> 3.
>
> This is because the elements of X are really 5, 0, and 4.  The zero is just
> not stored, but it still is the value of that element.
>
>
> > Is m_y 11/3 or (6/2) because we ignore the "5" since it's
> > counterpart in X is undefined?.
> >
>
> 11/3
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message