hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball ...@cs.washington.edu>
Subject Re: Test Data for Hadoop Student
Date Sat, 17 Nov 2007 02:21:11 GMT
Bruce,

I helped design and teach an undergrad course based on Hadoop last year. 
Along with some folks at Google, we then made the resources available 
together to distribute to other universities and the public at large 
(via Creative Commons license, actually).

All the materials are available online here:
http://code.google.com/edu/content/parallel.html
(lecture notes, labs, and even video lectures.)

It includes suggested lab activities. Good free data sets you can 
download include Netflix prize data and a copy of the wikipedia corpus. 
Of course, you can set up Nutch and do your own web crawl too.

We also highly endorse the Amazon EC2 idea for doing your own labs :)

Best of luck,
- Aaron



Edward Bruce Williams wrote:
> Hello
> 
>  
> 
> I am a student doing an independent study project investigating the
> possibility of teaching large scale computing on a small scale budget.  Th
> 
>  
> 
> My thought is to use available Open Source ( Hadoop) and Creative Commons
> and other materials as the text.  A student could then do significant
> computing on Amazon for the cost of what they would usually pay for a
> textbook.  I have convinced an agency of the state of California that paying
> for computer time for a CS student is "like buying a textbook or calculator
> for a math student", so "so  far so good."
> 
>  
> 
> I am asking if anyone has some largish data sets, preferably on Amazon, we
> could use for  class projects to contact me off list.
> 
>  
> 
> Thanks,
> 
>  
> 
> Bruce Williams 
> 
>  
> 
> 

Mime
View raw message