mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Need for a distributed SVDRecommender
Date Fri, 19 Nov 2010 22:04:01 GMT
That result sounds confusing. It should take about the same number of
wall-clock hours either way. I don't see why it would take 14 hours -- that
sounds wrong. If anything it should take 38 / N minutes where N is the
number of recommenders
you ran.

SVDRecommender is not distributed at all, no.

On Fri, Nov 19, 2010 at 9:34 PM, Sanjib Kumar Das <sanjib.kgp@gmail.com>wrote:

> Hi All,
>
> I wanted to run a distributed RecommenderJob with the SVDRecommender
> implementation.
> So i ran the pseudo.RecommenderJob with an
> SVDRecommender(numFeatures=30,trainingSteps=50) on the 1M Movielens
> data(6040 users). So this generated 10 recommendations for each of the 6040
> users but took 14 hours to do so! My hadoop cluster had 12 m/cs. So i guess
> it just ran multiple instances of the non-distributed SVD implementation
> and
> each of these instances did the same thing again and again. So unless the
> implementation of the recommender is distributed, we dont get any special
> benefit with the pseudo.RecommenderJob.
>
> But the item.RecommenderJob does the same 10 recommendations each for the
> 6040 users in 38 minutes. This is because it has an underlying distributed
> implementation.
>
> So my doubt is do we have a distributed SVDRecommender implementation? If
> not, how should i go about writing one? Can I use the new LanczosSolver to
> achieve this?
>
> Thanks,
> Sanjib
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message