mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saikat Kanjilal <>
Subject RE: ItemSimilarity algorithm
Date Thu, 05 Jul 2012 15:36:45 GMT

Thanks for the input Sean, one other question, in the scenario where most of the recommendations
are boolean style recommendations (i.e. a csv file that just says that a user has some sort
of association with an item), is it fair to say that the tanimoto and loglikelihood coefficients
perform better than the other coefficients.  I wanted to get a deeper understanding of this
as well, thanks for your insight.

> Date: Tue, 3 Jul 2012 19:19:07 +0300
> Subject: Re: ItemSimilarity algorithm
> From:
> To:
> Item-item similarity is a property of the information you have on two
> items and just those items. Whether there are just those 2 items over
> 500K users, or 2M items over 500K users, makes no difference. So no I
> don't think that this skew implies you should use any particular
> algorithm, by itself.
> I think other considerations tend to dominate. For example very sparse
> data makes Pearson / cosine measure not work well. But with so
> relatively few items... I imagine it is not so sparse.
> On Tue, Jul 3, 2012 at 6:57 PM, Saikat Kanjilal <> wrote:
> >
> > Hello Everyone,I was reading through the documentation on the different itemsimilarity
algorithms in mahout and had a question, if one has a scenario where the number of items are
significantly less  than the number of users (say 500,000 users to 1000 items) are there particular
item similarity coefficients (namely logLikelihood or tanimoto coeeficient) that lend themself
to producing better recommendations, I've read through the Mahout in action and the java docs
and cant seem to find any clues on this.  Any insight based on your experience would be much
> > Regards
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message