mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abmar Barros <abma...@gmail.com>
Subject Re: ItemSimilarity pre-processing
Date Thu, 14 Jul 2011 15:14:16 GMT
Thanks for the reply Sean,

Another doubt: Does the ReloadFromJDBCDataModel fit my case? Is it a
all-in-memory strategy?

Abmar

On Tue, Jul 12, 2011 at 1:22 PM, Sean Owen <srowen@gmail.com> wrote:

> Instead of pre-processing, you can put a CachingItemSimilarity on top of
> your ItemSimilarity. At least it will remember what it has already
> computed,
> and you don't have to pre-compute everything, most of which is wasted.
>
> You can also look at different CandidateItemStrategy classes. You can use
> it
> to have it consider fewer item-item pairs.
>
> But for MapReduce, you want to look at
> org.apache.mahout.cf.taste.hadoop.item. There's a job there that will
> compute all-pairs item-item similarity.
>
> Sean
>
> On Tue, Jul 12, 2011 at 4:32 PM, Abmar Barros <abmargb@gmail.com> wrote:
>
> > Hi all,
> >
> > I am new to Mahout and I am putting up a Recommender for buddycloud (
> > http://buddycloud.com/) as a part of my GSoC project (
> > https://github.com/buddycloud/channel-directory).
> > In the testing snapshot, I got ~100k users, ~20k items and ~230k boolean
> > taste preferences.
> > At first I tried an UserBasedRecommender, with an all-in-memory DataModel
> > (read from dump file, created a GenericDataModel). The recommendations
> > performed great, almost real time. However, I thought this strategy
> > wouldn't
> > scale, once the number of users and items tend to increase, and then the
> > service could run out-of-memory.
> >
> > Then I tried a PostgreSQLBooleanPrefJDBCDataModel, and, as expected, the
> > performance dropped drastically. After reading the blog post at
> >
> >
> http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/
> > ,
> > I decided to try an ItemBasedRecommender, using a preprocessed
> > ItemSimilarity table. I am trying to not use MapReduce at first, thus I
> > tried to compute the LogLikehood similarity from every pair of item. This
> > took too long, and then I gave up.
> >
> > Finally, my questions are: Am I doing things right? What is the best way
> to
> > compute item similarity offline without MapReduce?
> >
> > Thanks in advance!
> > Abmar
> >
> > --
> > Abmar Barros
> > MSc candidate on Computer Science at Federal University of Campina Grande
> -
> > www.ufcg.edu.br
> > OurGrid Team Member - www.ourgrid.org
> > Paraíba - Brazil
> >
>



-- 
Abmar Barros
MSc candidate on Computer Science at Federal University of Campina Grande -
www.ufcg.edu.br
OurGrid Team Member - www.ourgrid.org
Paraíba - Brazil

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message