mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Roy <abhishekr...@gmail.com>
Subject Re: Custom Item Similarity :datamodel not sure
Date Fri, 28 Sep 2012 16:07:41 GMT
Sean Owen <srowen <at> gmail.com> writes:

> 
> But what was your input? Item item similarity? Then you already had item
> item similarity. And what you compute from that method is probably not
> meaningful.
> 
> You don't have a recommender problem so there is no question of what to
> feed to a recommender.  Don't use it at all. You already have all you need
> in your ItemSimilarity.
> On Sep 27, 2012 7:50 PM, "Abhishek Roy" <abhishekroy8 <at> gmail.com> wrote:
> 
> >
> > Thanks Sean. I get your point. Will try incorporating that.
> > Earlier, as I mentioned, for a small item count(<5000), the
> > input(datamodel) to
> > the recommender was nC2 item-item pairs(tried to feed uniform preference
> > for
> > each item to every other item), without the rating field, and then called
> > recommender.mostSimilarItems() to get the list. nC2 works, but is not
> > scalable.
> > It worked well as the recommendations were the similar items(that works
> > for me
> > now).
> >  Although am digging through the code to see what least input I can give,
> > any
> > meaningful suggestion for data input would be awesome.
> >
> >
> >Thanks for your inputs Sean. I implemented the top N(most similar items) 
looking at and reusing the most SimilatItems available. Works fine. Now, scale 
in action ! testing with a set of 200,000 items, computing the most similar 
items for 1 item takes around 20 secs.
My approach is to pre-compute most similar for all the 200,000 items.
I am not looking at Hadoop for now (2000 item base currently). I know I can 
reduce my data size for similarity computation. 
What are my options ?
> >
> 





Mime
View raw message