hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sangroya <sangroyaa...@gmail.com>
Subject RecommenderJob Mahout Long Response Time
Date Wed, 14 Sep 2011 11:30:34 GMT
Hi all,

I am trying to run the example from
https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering,

with the following command bin/mahout
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input -Dmapred.output.dir=output --itemsFile itemfile
--tempDir tempDir

The algorithm estimate the preference of a user towards an item which he/she
has not yet seen. Once an algorithm can predict preferences it can also be
used to do Top-N-Recommendation where the task is to find the N items a
given user might like best. It is mentioned that given a DataModel, it can
produce recommendations.

The algorithm takes approx. 5 minutes to generate top 5 recommendations for
one user on a 10 node hadoop cluster. The size of input is shortened only to
200 users from "1 Million MovieLens Dataset" from Grouplens.org.

I have few questions:

1) I want to know that if it is possible to isolate the data model building
step to generating recommendations.

2) Can we use the model once generated using the training data for
generating recommendations for a range of users.

3) To be specific, if I want to provide an on-line service that generates
recommendations for users, Can I minimize the cost of MapReduce interactions
each time.

I am not a data mining expert. Please help me to understand this in a better
way.


Thanks and Regards,
Amit

--
View this message in context: http://lucene.472066.n3.nabble.com/RecommenderJob-Mahout-Long-Response-Time-tp3335505p3335505.html
Sent from the Hadoop lucene-dev mailing list archive at Nabble.com.

Mime
View raw message