mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Parsons <andypars...@gmail.com>
Subject Re: Evaluating Mahout's recommender support
Date Wed, 29 Dec 2010 17:49:05 GMT
Thanks Sean and Sebastian. I've responded to the questions inline:

On Dec 29, 2010, at 6:26 AM, Sean Owen wrote:

> Yeah that review was, IMHO, had issues. It's important to note the
> context: the person was selling their own services. It was trying to
> run some sample code, non-distributed code, in a sort of distributed
> fashion. The result was predictably not so good. That was a long time
> ago.
> 
> 2M users and 10M items isn't big even for a non-distributed
> recommender. This doesn't even sound hard for a non-distributed Mahout
> recommender. Sure, let's hear more and we can give some ideas.
> 
> On Wed, Dec 29, 2010 at 4:08 AM, Sebastian Schelter <ssc@apache.org> wrote:
>> Hi all,
>> 
>> once again, I'm moving a twitter conversation to this mailing list.
>> 
>> Let me introduce Andy, who is currently evaluating recommendation
>> components for his NYC located startup and looking into Mahout for that
>> reason:
>> 
>> "We are coding primarily in Scala and looking to build or license a
>> recommendation component. The base requirement is that it be capable of
>> hybrid recommendations on a body of ~2MM users and ~10MM items with rich
>> metadata.  The paper I referenced seems to indicate Mahout is not a
>> great fit- can you point me to recent improvements that make the
>> assertions in the paper obsolete? Any guidance is very much appreciated!"
>> 
>> The paper which he's quoting is an old review of Mahout's recommender
>> support available at
>> http://www.iletken-project.com/documents/mahout_review_by_iletken.pdf .
>> I think we should give great advice to Andy and simulatenously give the
>> community an update about the criticized facts in that review that are
>> not true anymore.
>> 
>> I'll make a first try to address the state of that review:
>> 
>>  - Mahout currently offers parallel algorithms for Collaborative
>> Filtering, see
>> https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering
>> which can also be used to precompute a model which can than be used for
>> online recommendations.
>> 
>>  - Mahout has some support for matrix factorization based recommenders (
>> https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/recommender/svd/SVDRecommender.html
>> ), a superior algrithm to this (
>> https://issues.apache.org/jira/browse/MAHOUT-525 ) as well as a parallel
>> implementation ( https://issues.apache.org/jira/browse/MAHOUT-542 ) are
>> currently in the making
>> 
>>  -The memory consumption of Taste has significantly improved, I never
>> tried to load the Netflix dataset, but I'm pretty sure it fits into some
>> hundred megabytes of memory.
>> 
>> Furthermore I think we need to know more details about Andy's usecase to
>> give him proper answers about Mahout fitting his project:
>> 
>> - Do you have explicit ratings from the users or are you working with
>> implicit data?
[ASP] We will have both, in the form of ratings, views/purchases, and "recommend to a friend"
>> 
>> - What do you exactly mean by hybrid recommendations? Do you mean a
>> combination of content based and collaborative filtering techniques?
[ASP] Yes, precisely.
>> 
>> - How fast do you need the recommendations? Would it be ok to have them
>> precomputed on a daily basis e.g. or do you need them in realtime?
[ASP] Either *could* work, with a preference for realtime.
>> 
>> - How often do new users and new items enter your dataset? How sparse is
>> your rating data?
[ASP] New users are added in the hundreds on a daily basis. Rating data will be very sparse
in the initial months the application is live, so we are looking at options for priming the
system. Given the quantity of items, however, we'll have fairly sparse rating/item coverage
in general.
>> 
>> --sebastian
>> 


Mime
View raw message