More contests at: http://challenge.gov/NIH/132-nlm-show-off-your-apps-innovative-uses-of-nlm-information
On May 15, 2011, at 10:25 PM, Alex Kozlov wrote:
> On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
>
>> Due to the whole Netflix data lawsuit, the training data is synthetic,
>> which
>> puts the contestants at a disadvantage, and another interesting fact:
>> runtime
>> performance is at issue: your code will be run *live*, with your model
>> being
>> used to produce recommendations with a hard timeout of 50ms - if you
>> miss this more than 20% of the time, you fail to progress to the end of
>> the semi-final round.
>>
>
> If the dataset is synthetic (and I assume not random) is the goal to just
> guess the model that generated the dataset? Assuming it performs well, how
> far us the 'synthetic' model from the actual customer behavior so that there
> are no 'surprises' when it runs 'live'?
>
> Potentially, there are more avenues for a lawsuit than in the Netflix case
> since money is involved (just a thought).
>
> Alex K
--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org
|