mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Need for a distributed SVDRecommender
Date Sat, 20 Nov 2010 08:23:37 GMT
I see, yes, the latter is actually distributed. They are very different
algorithms anyway.

On Fri, Nov 19, 2010 at 11:24 PM, Sanjib Kumar Das <sanjib.kgp@gmail.com>wrote:

>  it takes 14 hrs to run the *pseudo*.RecommenderJob with the
> SVDRecommender.
> Ran the following command:
> hadoop jar recommender.jar
> org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob
> -Dmapred.input.dir=testdata/ratings.csv -Dmapred.output.dir=outputBR
> --recommenderClassName
> org.apache.mahout.cf.taste.example.bucky.BuckyRecommender
>
> Here BuckyRecommender is SVDRecommender(30,50)
>
>
> it takes 38 minutes if I run the *item*.RecomenderJob with the following
> command :
> hadoop jar recommender.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=testdata/ratings.csv -Dmapred.output.dir=output
>
> item.RecommenderJob is very different from pseudo.RecommenderJob (in terms
> of the distributed implementation) hence the difference in timings, i
> guess.
>
>
> On Fri, Nov 19, 2010 at 4:04 PM, Sean Owen <srowen@gmail.com> wrote:
>
> > That result sounds confusing. It should take about the same number of
> > wall-clock hours either way. I don't see why it would take 14 hours --
> that
> > sounds wrong. If anything it should take 38 / N minutes where N is the
> > number of recommenders
> > you ran.
> >
> > SVDRecommender is not distributed at all, no.
> >
> > On Fri, Nov 19, 2010 at 9:34 PM, Sanjib Kumar Das <sanjib.kgp@gmail.com
> > >wrote:
> >
> > > Hi All,
> > >
> > > I wanted to run a distributed RecommenderJob with the SVDRecommender
> > > implementation.
> > > So i ran the pseudo.RecommenderJob with an
> > > SVDRecommender(numFeatures=30,trainingSteps=50) on the 1M Movielens
> > > data(6040 users). So this generated 10 recommendations for each of the
> > 6040
> > > users but took 14 hours to do so! My hadoop cluster had 12 m/cs. So i
> > guess
> > > it just ran multiple instances of the non-distributed SVD
> implementation
> > > and
> > > each of these instances did the same thing again and again. So unless
> the
> > > implementation of the recommender is distributed, we dont get any
> special
> > > benefit with the pseudo.RecommenderJob.
> > >
> > > But the item.RecommenderJob does the same 10 recommendations each for
> the
> > > 6040 users in 38 minutes. This is because it has an underlying
> > distributed
> > > implementation.
> > >
> > > So my doubt is do we have a distributed SVDRecommender implementation?
> If
> > > not, how should i go about writing one? Can I use the new LanczosSolver
> > to
> > > achieve this?
> > >
> > > Thanks,
> > > Sanjib
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message