mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjib Kumar Das <sanjib....@gmail.com>
Subject Re: Need for a distributed SVDRecommender
Date Fri, 19 Nov 2010 23:24:02 GMT
 it takes 14 hrs to run the *pseudo*.RecommenderJob with the SVDRecommender.
Ran the following command:
hadoop jar recommender.jar
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob
-Dmapred.input.dir=testdata/ratings.csv -Dmapred.output.dir=outputBR
--recommenderClassName
org.apache.mahout.cf.taste.example.bucky.BuckyRecommender

Here BuckyRecommender is SVDRecommender(30,50)


it takes 38 minutes if I run the *item*.RecomenderJob with the following
command :
hadoop jar recommender.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=testdata/ratings.csv -Dmapred.output.dir=output

item.RecommenderJob is very different from pseudo.RecommenderJob (in terms
of the distributed implementation) hence the difference in timings, i guess.


On Fri, Nov 19, 2010 at 4:04 PM, Sean Owen <srowen@gmail.com> wrote:

> That result sounds confusing. It should take about the same number of
> wall-clock hours either way. I don't see why it would take 14 hours -- that
> sounds wrong. If anything it should take 38 / N minutes where N is the
> number of recommenders
> you ran.
>
> SVDRecommender is not distributed at all, no.
>
> On Fri, Nov 19, 2010 at 9:34 PM, Sanjib Kumar Das <sanjib.kgp@gmail.com
> >wrote:
>
> > Hi All,
> >
> > I wanted to run a distributed RecommenderJob with the SVDRecommender
> > implementation.
> > So i ran the pseudo.RecommenderJob with an
> > SVDRecommender(numFeatures=30,trainingSteps=50) on the 1M Movielens
> > data(6040 users). So this generated 10 recommendations for each of the
> 6040
> > users but took 14 hours to do so! My hadoop cluster had 12 m/cs. So i
> guess
> > it just ran multiple instances of the non-distributed SVD implementation
> > and
> > each of these instances did the same thing again and again. So unless the
> > implementation of the recommender is distributed, we dont get any special
> > benefit with the pseudo.RecommenderJob.
> >
> > But the item.RecommenderJob does the same 10 recommendations each for the
> > 6040 users in 38 minutes. This is because it has an underlying
> distributed
> > implementation.
> >
> > So my doubt is do we have a distributed SVDRecommender implementation? If
> > not, how should i go about writing one? Can I use the new LanczosSolver
> to
> > achieve this?
> >
> > Thanks,
> > Sanjib
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message