hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Williams" <williams.br...@gmail.com>
Subject RE: Test Data for Hadoop Student
Date Sat, 17 Nov 2007 04:27:25 GMT
When I mentioned Creative Commons materials, I had the University of
Washington materials in mind. 

Thank you for your response

Bruce Williams

-----Original Message-----
From: Aaron Kimball [mailto:ak@cs.washington.edu] 
Sent: Friday, November 16, 2007 6:21 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Test Data for Hadoop Student


I helped design and teach an undergrad course based on Hadoop last year. 
Along with some folks at Google, we then made the resources available 
together to distribute to other universities and the public at large 
(via Creative Commons license, actually).

All the materials are available online here:
(lecture notes, labs, and even video lectures.)

It includes suggested lab activities. Good free data sets you can 
download include Netflix prize data and a copy of the wikipedia corpus. 
Of course, you can set up Nutch and do your own web crawl too.

We also highly endorse the Amazon EC2 idea for doing your own labs :)

Best of luck,
- Aaron

Edward Bruce Williams wrote:
> Hello
> I am a student doing an independent study project investigating the
> possibility of teaching large scale computing on a small scale budget.  Th
> My thought is to use available Open Source ( Hadoop) and Creative Commons
> and other materials as the text.  A student could then do significant
> computing on Amazon for the cost of what they would usually pay for a
> textbook.  I have convinced an agency of the state of California that
> for computer time for a CS student is "like buying a textbook or
> for a math student", so "so  far so good."
> I am asking if anyone has some largish data sets, preferably on Amazon, we
> could use for  class projects to contact me off list.
> Thanks,
> Bruce Williams 

View raw message