mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bourke <steven.bou...@ucd.ie>
Subject Re: Netflix dataset unavailable
Date Fri, 12 Mar 2010 15:24:27 GMT
Making an assumption here - 

As netflix are being sued for releasing the dataset I bet we will find it very difficult to
get hold of it in the future. 


On 12 Mar 2010, at 15:13, Tamas Jambor wrote:

> does anyone know if it is possible to get the netflix quiz set? they said they would
release it after the competition ends.
> 
> T
> 
> On 10/03/2010 04:18, Jake Mannix wrote:
>> On Tue, Mar 9, 2010 at 7:49 PM, Robin Anil<robin.anil@gmail.com>  wrote:
>> 
>>   
>>> http://warsteiner.db.cs.cmu.edu/db-site/Datasets/graphData/
>>> Seems like there are plenty of interesting datasets here to try mahout on.
>>> There is even a p2p network graph. 790MB compressed Sounds like a good test
>>> matrix for the decomposer stuff
>>> 
>>>     
>> Three words: twitter social graph:
>>     http://an.kaist.ac.kr/traces/WWW2010.html
>> 6GB compressed, 60M x 60M sparse matrix.
>> 
>> I've pulled the torrent and will put sequence files of vectors in some s3
>> buckets
>> once I get them processed.  This is a matrix with a good 1.47B nonzero
>> entries, and
>> is publically available.  Not record breaking, but pretty darn huge.
>> 
>>   -jake
>> 
>>   


Mime
View raw message