hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Williams" <williams.br...@gmail.com>
Subject Re: Test Data for Hadoop Student
Date Wed, 02 Apr 2008 22:13:05 GMT
I met Christophe at the Hadoop Conference at Yahoo last week. I really
liked him. He asked me to maintain the Google Ubuntu Hadoop image, I
sent him the following about my project. Would you read it and offer
any comments?

I sent him the following:

"Can I tell you more about my Hadoop in education project?

My project started when I found out  Amazon.com will ( who would have
thought ) will let you rent their computers by the hour.  I realized
this would be an ideal way for small schools ( really ALL schools -
even MIT/Berkley has a hard time coming up with a hundred idle
computers  for  a student to use )  to have  access to the  resources
 to  expose  students  ( not just Computer  Science, but Physics,
Astronomy, Biology, etc ) to working  in this  environment.  That is
what my independent study project  is about.  I am producing a student
client workstation image and a department  server image with
everything needed to teach the a course and hookup to Amazon. The
course ware I am getting from the University of Washington and
documentation from all over. I first emailed the hadoop list and Aaron
Kimball responded and offered his courseware and to "highly endorse
your Amazon EC2 idea for doing your labs" . I would love any resources
you can point me to.  The economic model is the students don't need to
buy a textbook and use the money to buy computer time from Amazon. I
have already gotten an agency of the state of California  to fund  my
computer time as a student, so that is a good precedent.

I am getting weather data and an application to process it with snazzy
graphics as a student project. I hope to add more as time goes on. I
know I can be accused of the buzzword of the moment, but I hope to put
on the server image software to provide a linkup to other people using
the server to form a community. The more communication, the faster
thing will happen. So model is more "seeding" than "doing".

This is an example of what I am putting into this. This needs to work
"on its own", with common problems pre-solved, without a lot of case
by case work at each institution. I talked to Jinesh Varia from Amazon
and I may be able to get them to design a custom product for
education, accounting and billing that works with my server to make it
simple and secure for the students and teacher to use. If a teacher
has to deal with  money and billing, it will fail. If you require a
school's bureaucracy to handle something new, it will be a harder
sell. Schools supply student course needs through the school
bookstore. Who is the approved vendor to the school bookstore, Amazon?
More case by case work..  And then they want to mark it up 50%-100%.
No no no. So a student logs into my server and buys time like you
would anything else on the Internet. But bookkeeping and usage records
are kept for the teacher. Simple. No extra work in this area to offer
the course.

When I met you, I felt I had met someone who thought exactly like me
on how important it is to facilitate moving people along from not just
CS, but other areas to take advantage of the potential  this
technology. I want to make this available to the guy off in a corner
somewhere who has a crazy idea that deserves a Nobel Prize to have
what it takes to succeed.  Elite should be elite based on worth, not
restricted by access to elite level resources as much as can be made
possible. "

I would appreciate your  comments and if you like the ideas, any
support you could give yourself and any encouragement you could give
Christophe to support this would be appreciated,

BTW - my personal email is bruce@electricranch.com. electricranch -
"herds of CPU's". I use the gmail address for lists that may expose me
to spam.


On Fri, Nov 16, 2007 at 7:21 PM, Aaron Kimball <ak@cs.washington.edu> wrote:
> Bruce,
> I helped design and teach an undergrad course based on Hadoop last year.
> Along with some folks at Google, we then made the resources available
> together to distribute to other universities and the public at large (via
> Creative Commons license, actually).
> All the materials are available online here:
> http://code.google.com/edu/content/parallel.html
> (lecture notes, labs, and even video lectures.)
> It includes suggested lab activities. Good free data sets you can download
> include Netflix prize data and a copy of the wikipedia corpus. Of course,
> you can set up Nutch and do your own web crawl too.
> We also highly endorse the Amazon EC2 idea for doing your own labs :)
> Best of luck,
> - Aaron
> Edward Bruce Williams wrote:
> > Hello
> >
> >
> > I am a student doing an independent study project investigating the
> > possibility of teaching large scale computing on a small scale budget.  Th
> >
> >
> > My thought is to use available Open Source ( Hadoop) and Creative Commons
> > and other materials as the text.  A student could then do significant
> > computing on Amazon for the cost of what they would usually pay for a
> > textbook.  I have convinced an agency of the state of California that
> paying
> > for computer time for a CS student is "like buying a textbook or
> calculator
> > for a math student", so "so  far so good."
> >
> >
> > I am asking if anyone has some largish data sets, preferably on Amazon, we
> > could use for  class projects to contact me off list.
> >
> >
> > Thanks,
> >
> >
> > Bruce Williams
> >
> >
> >

View raw message