hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Forage <Michael.For...@livenation.co.uk>
Subject RE: Getting started recommendations
Date Fri, 11 Jan 2013 11:22:33 GMT
I am still new but had similar questions and went through a lot of pain getting started

If you want to get programming rather than spend time learning how to install, configure and
administer the Hadoop tools I recommend using Amazon Elastic MapReduce.
This will very quickly get you to a stage where you are able to submit and run mapreduce jobs
(and pig, hive, etc...)

It's a very cheap option for learning the platform, especially if you use the Ruby command-line
tool which allows you to re-use your Hadoop instances for multiple jobs rather than the more
expensive default of starting and stopping new clusters each time. It's got some pretty decent
tutorials although (as with everything hadoop it seems) the area is so large that inevitably
you'll be googling some things or asking questions here

Also, I found the book "Hadoop in Action" very readable and informative, even as someone who
has only sporadically used Java throughout my career. This actually takes you through different
use cases based on test data downloadable from the web. Only issue is that it's written based
on the older (though fully supported Hadoop 0.20) API and since it's written for someone with
a local Hadoop cluster you have a small effort to translate to the Amazon EMR way of doing
things. Still very useful though


From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: 11 January 2013 10:29
To: user@hadoop.apache.org
Subject: Getting started recommendations

We are somewhat new to Hadoop and are looking to run some experiments with HDFS, Pig, and
With that in mind, I have a few questions:
What is the easiest (preferably free) Hadoop distro to get started with?  Cloudera?
What host OS distro/release is recommended?
What is the easiest environment to get started with?  Amazon EC2?  Is there anyone offering
virtual/hosted prebuilt Hadoop instances?
Where would we find some "big data" files that people have used for testing purposes?
Feel free to RTFM me to the right place ;-)
Thanks, john

View raw message