Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Message-ID: <4AF48AA8.2030802@cloudera.com>
Date: Fri, 06 Nov 2009 12:44:24 -0800
From: Amr Awadallah <aaa@cloudera.com>
Organization: Cloudera, Inc.
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: common-user@hadoop.apache.org
Subject: Re: Time to build my own cluster - advice?
References: <c9b0d8bd0911050909r4446dcaag56f9557dd31bf94f@mail.gmail.com>
	 <32120a6a0911050915l4b55ffat571778ce9d866ca5@mail.gmail.com>
	 <4AF31393.3050506@gmail.com>
 <c9b0d8bd0911051014n6abb26c1y2c48ffcaa787d799@mail.gmail.com>
 <4AF31B6C.9040305@gmail.com>
In-Reply-To: <4AF31B6C.9040305@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

yep,

hadoop-ec2 launch-cluster amr-cluster 5

will launch a cluster of 5 nodes, after you setup environment variables 
for AWS credentials and a small config file describing which 
AMI/instance-type/zone to use, see:

http://archive.cloudera.com/docs/_getting_started.html

then

hadoop-ec2 terminate-cluster amr-cluster

will take the cluster down instantaneously (otherwise you will keep 
paying money even if nodes idle), but that means all the data you had in 
HDFS will be gone with the nodes, so you should save that data to S3/EBS 
or launch an EBS-bacsed cluster as described here:

http://archive.cloudera.com/docs/_getting_started_and_basic_example_instructions.html

-- amr

Edmund Kohlwey wrote:
> First of all, let me say I don't use EC2 - there's some people at my 
> company who do, but I've been fortunate enough to use our internal dev 
> cluster for all the work I've done, so this is total hearsay.
>
> That having been said, the people that I know who are using EC2 aren't 
> leaving the cluster running when not in use - there's scripts from (I 
> believe) Cloudera that can allocate and configure the right number of 
> nodes on EC2 with whatever AMI you specify, and then tear them down 
> when you're done.
>
> On 11/5/09 1:14 PM, Mark Kerzner wrote:
>> Edmund,
>>
>> I wanted to install OpenOffice and connect to it from my java code. I 
>> tried
>> to replicate the complete install by copying it, but there must be 
>> something
>> else there, because I can't connect on Amazon MapReduce, but I can on 
>> my own
>> cluster.
>>
>> When you say cheaper, do you mean that keeping your own EC2 machines 
>> up and
>> using them as hadoop cluster is in the end cheaper than starting a 
>> Hadoop
>> cluster every time you want to run a job?
>>
>> Thank you,
>> Mark
>>
>> On Thu, Nov 5, 2009 at 12:04 PM, Edmund Kohlwey<ekohlwey@gmail.com>  
>> wrote:
>>
>>   
>>> If all your dependencies are java based (like Open Office) you might 
>>> try
>>> using a dependency manager/build tool like maven or ant/ivy to 
>>> package the
>>> dependencies in your jar. I'm not sure if any parts of open office are
>>> available in a public repo as maven artifacts or not, or if you want 
>>> to get
>>> into packaging artifacts for your build system, but its something 
>>> you might
>>> try.
>>>
>>> I think its cheaper to just use EC2 anyways, so that might be a 
>>> motivating
>>> factor for you as well.
>>>
>>>   Hi,
>>>     
>>>>> so far I've been using Amazon MapReduce. However, my app uses a 
>>>>> growing
>>>>> number of Linux packages. I have been installing them on the fly, 
>>>>> in the
>>>>> Mapper.configure(), but with OpenOffice this is hard, and I don't 
>>>>> get a
>>>>> service connection even after local install.
>>>>>
>>>>> Therefore, my question is on the advice in creating my own Hadoop 
>>>>> cluster
>>>>> out of EC2 machines. Are there instructions? How hard is it? What are
>>>>> best
>>>>> practices?
>>>>>
>>>>> Thank you,
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>        
>>>      
>>    
>