mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bellasio <>
Subject Re: Understanding similaraties computation in RecommenderJob
Date Tue, 18 Jan 2011 11:07:34 GMT
Just to say that i found RowSimilarityJob thanks :)
Il giorno 18/gen/2011, alle ore 11.32, Sebastian Schelter ha scritto:

> Hi Stefano,
> AFAIK the chapter about distributed recommenders in Mahout in Action has not yet been
updated to the latest version of RecommenderJob maybe that's the source of your confusion.
> I'll try to give a brief explanation of the similarity computation, feel free to ask
more questions if things don't get clear.
> RecommenderJob starts ItemSimilarityJob which creates an item x user matrix from the
preference data and uses RowSimilarityJob to compute the pairwise similarities of the rows
of this matrix (the items). So the best place to start is looking at at RowSimilarityJob.
> RowSimilarityJob uses an implementation of DistributedVectorSimilarity to compute the
similarities in two phases. In the first phase each item-vector is shown to the similarity
implementation and it can compute a "weight" for it. In the second phase for all pairs of
rows that have at least one cooccurrence the method similarity(...) is called with the formerly
computed weights and a list of all cooccurring values. This generic approach allows us to
use different implementations of DistributedVectorSimilarity so we can support a wide range
of similarity functions.
> A simplified version of this algorithm is also explained in the slides of a talk I gave
at the Hadoop Get Together, maybe that's helpful too:
> --sebastian
> On 18.01.2011 11:12, Stefano Bellasio wrote:
>> Hi guys, im trying to understand how RecommenderJob works. Right now i was thinking
that was necessary choosing a particular similarity class like Euclidean Distance and so on,
so my algorithm could compute all similarities for each pair of items and produce recommendations.
Reading Mahout in Action, "Distributing a Recommender" i have now some doubts about the correlation
between similarities like Euclidean, LogLike, Cosine and the co-occurence matrix, as i see
in RecommenderJob i can specify also "Co-occurrence" as a similarity class, so what's the
correct way to compute similarities and how this happens with other similarities class and
co-occurrence matrix/similarity. Thank you very much for your further explanations :)

View raw message