hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Aquilina <jaquil...@eagleeyet.net>
Subject Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution
Date Mon, 19 Oct 2015 10:26:12 GMT
Hey Jose

Have you looked at Amazon emr ( elastic map reduce) where I work we have used it and when
you provision the emr instance you can use custom jars like the one you mentioned. 

In terms of storage you can use either hdfs, if you are going to keep a persistent cluster.
If not you can store your data in an Amazon s3 bucket. 

Documentation for emr is really good. At the time when we did this and this was at the beginning
of this year and they supported Hadoop 2.6. 

In my honest opinion you are giving yourself a lot of extra work for nothing to get us in
Hadoop. Try out emr with temporary cluster and go from there. I managed to tool up and learn
how to work with emr in a week.

Sent from my iPhone

> On 19 Oct 2015, at 02:10, José Luis Larroque <larroquester@gmail.com> wrote:
> Thanks for your answer Anders.
> -The amount of data that i'm going to manipulate it's like the wikipedia (i will use
a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode cluster setup
and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use several nodes.
> Maybe another option to considerate should be that i'm running Giraph on top of the selected
hadoop distribution/EC2.
> Bye!
> Jose
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <anders.shinde.nielsen@gmail.com>:
>> Dear Jose, 
>> It will help people answer your question if you specify your goals :
>> -If you do it to learn how to USE a running Hadoop then go for one of the prebuilt
distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating Hadoop then you
are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce scripts
on a small data set?
>> Regards, 
>> Anders
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <larroquester@gmail.com>
>>> Hi all !
>>> I started to use hadoop with aws, and a big question appears in front of me!
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried some
trivial examples, and before moving forward i have one question.
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance 
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>> Sorry if my question is too broad.
>>> Bye!
>>> Jose

View raw message