mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Re : FYI Cloud Computing Resources
Date Wed, 03 Sep 2008 17:54:41 GMT

On Sep 3, 2008, at 4:34 AM, Sean Owen wrote:

> Yeah it's almost over unfortunately. :) I tried this a while ago with
> a slope-one recommender, and was only about able to match Netflix's
> current performance. I published some support code for people who
> wanted to play with it but removed it from Mahout's copy as legacy
> code.

Hmm, probably useful to keep the code around, even if it's just used  
as a sample of how to do things w/ Taste.  I imagine the Netflix data  
will live on for quite some time.

>
>
> I didn't really have time to investigate more. Some of the insights
> that have fallen out from the competition are pretty great. For
> example: one person took advantage of a sort of "memory effect" for
> recommendations.... people tend to at times over-rate movies and at
> times under-rate movies. So if you kind of correct for this -- that a
> sequence of 5-star ratings may not be as meaningful as a 5-star rating
> in the middle of several 2-star ratings, you get much better
> performance.
>
> This nugget of knowledge may be specific to Netflix, not sure. But it
> was interesting.
>
> On Wed, Sep 3, 2008 at 9:28 AM, deneche abdelhakim  
> <a_deneche@yahoo.fr> wrote:
>> I came across the following competition
>>
>> http://www.netflixprize.com/index
>>
>>
>> It's about recommender systems, so I think it's a Taste stuff. The  
>> training dataset consists of more than 100M ratings.
>>
>>
>> ----- Message d'origine ----
>> De : Josh Myer <josh@joshisanerd.com>
>> À : mahout-dev@lucene.apache.org
>> Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s
>> Objet : Re: FYI Cloud Computing Resources
>>
>> On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote:
>>> http://research.yahoo.com/node/2328
>>>
>>> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or
>>> are we just Mahouts?) to get some access to these resources.  One  
>>> big
>>> question is where can we get some fairly large data sets (large, but
>>> not super large, I think, but am not sure)
>>>
>>> If you have ideas, etc. please let us know.
>>>
>>
>> It's worth plugging (theinfo), http://theinfo.org/.  It's a project  
>> to
>> collect references to datasets, and may help here.  Unfortunately, it
>> seems to be laggy at the moment.  I'll poke Aaron about that =)
>>
>> HtH,
>> --
>> Josh Myer
>> josh@joshisanerd.com
>>
>>
>>
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








Mime
View raw message