mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: To all the recommendation people..
Date Tue, 17 May 2011 20:43:39 GMT
More contests at:

On May 15, 2011, at 10:25 PM, Alex Kozlov wrote:

> On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <> wrote:
>> Due to the whole Netflix data lawsuit, the training data is synthetic,
>> which
>> puts the contestants at a disadvantage, and another interesting fact:
>> runtime
>> performance is at issue: your code will be run *live*, with your model
>> being
>> used to produce recommendations with a hard timeout of 50ms - if you
>> miss this more than 20% of the time, you fail to progress to the end of
>> the semi-final round.
> If the dataset is synthetic (and I assume not random) is the goal to just
> guess the model that generated the dataset?  Assuming it performs well, how
> far us the 'synthetic' model from the actual customer behavior so that there
> are no 'surprises' when it runs 'live'?
> Potentially, there are more avenues for a lawsuit than in the Netflix case
> since money is involved (just a thought).
> Alex K

Grant Ingersoll
Lucene & Solr User Conference
May 25-26, San Francisco

View raw message