hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Kerzner <markkerz...@gmail.com>
Subject Re: Time to build my own cluster - advice?
Date Fri, 06 Nov 2009 20:54:10 GMT
Amr,

can I take a cloudera-based EMI, customize it by adding my Linux packages,
save as a private EMI, then use your script? Or does your script only work
with you EMIs?

Thank you,
Mark

On Fri, Nov 6, 2009 at 2:44 PM, Amr Awadallah <aaa@cloudera.com> wrote:

> yep,
>
> hadoop-ec2 launch-cluster amr-cluster 5
>
> will launch a cluster of 5 nodes, after you setup environment variables for
> AWS credentials and a small config file describing which
> AMI/instance-type/zone to use, see:
>
> http://archive.cloudera.com/docs/_getting_started.html
>
> then
>
> hadoop-ec2 terminate-cluster amr-cluster
>
> will take the cluster down instantaneously (otherwise you will keep paying
> money even if nodes idle), but that means all the data you had in HDFS will
> be gone with the nodes, so you should save that data to S3/EBS or launch an
> EBS-bacsed cluster as described here:
>
>
> http://archive.cloudera.com/docs/_getting_started_and_basic_example_instructions.html
>
> -- amr
>
>
> Edmund Kohlwey wrote:
>
>> First of all, let me say I don't use EC2 - there's some people at my
>> company who do, but I've been fortunate enough to use our internal dev
>> cluster for all the work I've done, so this is total hearsay.
>>
>> That having been said, the people that I know who are using EC2 aren't
>> leaving the cluster running when not in use - there's scripts from (I
>> believe) Cloudera that can allocate and configure the right number of nodes
>> on EC2 with whatever AMI you specify, and then tear them down when you're
>> done.
>>
>> On 11/5/09 1:14 PM, Mark Kerzner wrote:
>>
>>> Edmund,
>>>
>>> I wanted to install OpenOffice and connect to it from my java code. I
>>> tried
>>> to replicate the complete install by copying it, but there must be
>>> something
>>> else there, because I can't connect on Amazon MapReduce, but I can on my
>>> own
>>> cluster.
>>>
>>> When you say cheaper, do you mean that keeping your own EC2 machines up
>>> and
>>> using them as hadoop cluster is in the end cheaper than starting a Hadoop
>>> cluster every time you want to run a job?
>>>
>>> Thank you,
>>> Mark
>>>
>>> On Thu, Nov 5, 2009 at 12:04 PM, Edmund Kohlwey<ekohlwey@gmail.com>
>>>  wrote:
>>>
>>>
>>>
>>>> If all your dependencies are java based (like Open Office) you might try
>>>> using a dependency manager/build tool like maven or ant/ivy to package
>>>> the
>>>> dependencies in your jar. I'm not sure if any parts of open office are
>>>> available in a public repo as maven artifacts or not, or if you want to
>>>> get
>>>> into packaging artifacts for your build system, but its something you
>>>> might
>>>> try.
>>>>
>>>> I think its cheaper to just use EC2 anyways, so that might be a
>>>> motivating
>>>> factor for you as well.
>>>>
>>>>  Hi,
>>>>
>>>>
>>>>> so far I've been using Amazon MapReduce. However, my app uses a growing
>>>>>> number of Linux packages. I have been installing them on the fly,
in
>>>>>> the
>>>>>> Mapper.configure(), but with OpenOffice this is hard, and I don't
get
>>>>>> a
>>>>>> service connection even after local install.
>>>>>>
>>>>>> Therefore, my question is on the advice in creating my own Hadoop
>>>>>> cluster
>>>>>> out of EC2 machines. Are there instructions? How hard is it? What
are
>>>>>> best
>>>>>> practices?
>>>>>>
>>>>>> Thank you,
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message