hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jay vyas <jayunit100.apa...@gmail.com>
Subject Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution
Date Mon, 19 Oct 2015 15:37:09 GMT
Also, ASF BigTop packages hadoop for you.

You can always grab our releases
http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/

We package pig, spark, hive, hbase, ....

Its not had to set up a bigtop build server, as we have dockerized the
packaging of both RPM and Deb packages, and you can experiment locally with
this stuff using the vagrant recipes.



On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <jaquilina@eagleeyet.net>
wrote:

> Hey Jose
>
> Have you looked at Amazon emr ( elastic map reduce) where I work we have
> used it and when you provision the emr instance you can use custom jars
> like the one you mentioned.
>
> In terms of storage you can use either hdfs, if you are going to keep a
> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>
> Documentation for emr is really good. At the time when we did this and
> this was at the beginning of this year and they supported Hadoop 2.6.
>
> In my honest opinion you are giving yourself a lot of extra work for
> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
> there. I managed to tool up and learn how to work with emr in a week.
>
> Sent from my iPhone
>
> On 19 Oct 2015, at 02:10, José Luis Larroque <larroquester@gmail.com>
> wrote:
>
> Thanks for your answer Anders.
>
> -The amount of data that i'm going to manipulate it's like the wikipedia
> (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode
> cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use
> several nodes.
>
> Maybe another option to considerate should be that i'm running Giraph on
> top of the selected hadoop distribution/EC2.
>
> Bye!
> Jose
>
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <anders.shinde.nielsen@gmail.com
> >:
>
>> Dear Jose,
>>
>> It will help people answer your question if you specify your goals :
>>
>> -If you do it to learn how to USE a running Hadoop then go for one of the
>> prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating
>> Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
>> scripts on a small data set?
>>
>> Regards,
>>
>> Anders
>>
>>
>>
>>
>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>> larroquester@gmail.com> wrote:
>>
>>> Hi all !
>>>
>>> I started to use hadoop with aws, and a big question appears in front of
>>> me!
>>>
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>> some trivial examples, and before moving forward i have one question.
>>>
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>>
>>> Sorry if my question is too broad.
>>>
>>> Bye!
>>> Jose
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
jay vyas

Mime
View raw message