hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: RecommenderJob Mahout Creating a data model
Date Wed, 14 Sep 2011 14:36:00 GMT
This should probably be directed more toward the Mahout list then the Hadoop Map/reduce one.

mahout-user@apache.org

--Bobby Evans

On 9/14/11 6:28 AM, "Amit Sangroya" <sangroyaamit@gmail.com> wrote:

Hi all,

I am trying to run the example from
https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering
,

with the following command bin/mahout
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input -Dmapred.output.dir=output --itemsFile itemfile
--tempDir tempDir

The algorithm estimate the preference of a user towards an item which he/she
has not yet seen. Once an algorithm can predict preferences it can also be
used to do Top-N-Recommendation where the task is to find the N items a
given user might like best. It is mentioned that given a DataModel, it can
produce recommendations.

The algorithm takes approx. 5 minutes to generate top 5 recommendations for
one user on a 10 node hadoop cluster. The size of input is shortened only to
200 users from "1 Million MovieLens Dataset" from Grouplens.org.

I have few questions:

1) I want to know that if it is possible to isolate the data model building
step to generating recommendations.

2) Can we use the model once generated using the training data for
generating recommendations for a range of users.

3) To be specific, if I want to provide an on-line service that generates
recommendations for users, Can I minimize the cost of MapReduce interactions
each time.

I am not a data mining expert. Please help me to understand this in a better
way.


Thanks and Regards,
Amit


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message