hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amr Awadallah <...@cloudera.com>
Subject Re: Time to build my own cluster - advice?
Date Fri, 06 Nov 2009 20:59:16 GMT
yes, you can customize AMI, just update the cluster config file to point 
to your new AMI.

There is also an option to point to additional packages to be 
auto-installed after AMIs instantiated, see:

http://archive.cloudera.com/docs/_customization.html

-- amr

Mark Kerzner wrote:
> Amr,
>
> can I take a cloudera-based EMI, customize it by adding my Linux packages,
> save as a private EMI, then use your script? Or does your script only work
> with you EMIs?
>
> Thank you,
> Mark
>
> On Fri, Nov 6, 2009 at 2:44 PM, Amr Awadallah <aaa@cloudera.com> wrote:
>
>   
>> yep,
>>
>> hadoop-ec2 launch-cluster amr-cluster 5
>>
>> will launch a cluster of 5 nodes, after you setup environment variables for
>> AWS credentials and a small config file describing which
>> AMI/instance-type/zone to use, see:
>>
>> http://archive.cloudera.com/docs/_getting_started.html
>>
>> then
>>
>> hadoop-ec2 terminate-cluster amr-cluster
>>
>> will take the cluster down instantaneously (otherwise you will keep paying
>> money even if nodes idle), but that means all the data you had in HDFS will
>> be gone with the nodes, so you should save that data to S3/EBS or launch an
>> EBS-bacsed cluster as described here:
>>
>>
>> http://archive.cloudera.com/docs/_getting_started_and_basic_example_instructions.html
>>
>> -- amr
>>
>>
>> Edmund Kohlwey wrote:
>>
>>     
>>> First of all, let me say I don't use EC2 - there's some people at my
>>> company who do, but I've been fortunate enough to use our internal dev
>>> cluster for all the work I've done, so this is total hearsay.
>>>
>>> That having been said, the people that I know who are using EC2 aren't
>>> leaving the cluster running when not in use - there's scripts from (I
>>> believe) Cloudera that can allocate and configure the right number of nodes
>>> on EC2 with whatever AMI you specify, and then tear them down when you're
>>> done.
>>>
>>> On 11/5/09 1:14 PM, Mark Kerzner wrote:
>>>
>>>       
>>>> Edmund,
>>>>
>>>> I wanted to install OpenOffice and connect to it from my java code. I
>>>> tried
>>>> to replicate the complete install by copying it, but there must be
>>>> something
>>>> else there, because I can't connect on Amazon MapReduce, but I can on my
>>>> own
>>>> cluster.
>>>>
>>>> When you say cheaper, do you mean that keeping your own EC2 machines up
>>>> and
>>>> using them as hadoop cluster is in the end cheaper than starting a Hadoop
>>>> cluster every time you want to run a job?
>>>>
>>>> Thank you,
>>>> Mark
>>>>
>>>> On Thu, Nov 5, 2009 at 12:04 PM, Edmund Kohlwey<ekohlwey@gmail.com>
>>>>  wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> If all your dependencies are java based (like Open Office) you might
try
>>>>> using a dependency manager/build tool like maven or ant/ivy to package
>>>>> the
>>>>> dependencies in your jar. I'm not sure if any parts of open office are
>>>>> available in a public repo as maven artifacts or not, or if you want
to
>>>>> get
>>>>> into packaging artifacts for your build system, but its something you
>>>>> might
>>>>> try.
>>>>>
>>>>> I think its cheaper to just use EC2 anyways, so that might be a
>>>>> motivating
>>>>> factor for you as well.
>>>>>
>>>>>  Hi,
>>>>>
>>>>>
>>>>>           
>>>>>> so far I've been using Amazon MapReduce. However, my app uses a growing
>>>>>>             
>>>>>>> number of Linux packages. I have been installing them on the
fly, in
>>>>>>> the
>>>>>>> Mapper.configure(), but with OpenOffice this is hard, and I don't
get
>>>>>>> a
>>>>>>> service connection even after local install.
>>>>>>>
>>>>>>> Therefore, my question is on the advice in creating my own Hadoop
>>>>>>> cluster
>>>>>>> out of EC2 machines. Are there instructions? How hard is it?
What are
>>>>>>> best
>>>>>>> practices?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>             
>>>>>           
>>>>         
>>>       
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message