Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 96720 invoked from network); 6 Nov 2009 20:45:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Nov 2009 20:45:01 -0000 Received: (qmail 58704 invoked by uid 500); 6 Nov 2009 20:44:58 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 58637 invoked by uid 500); 6 Nov 2009 20:44:58 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 58627 invoked by uid 99); 6 Nov 2009 20:44:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2009 20:44:58 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.211.185] (HELO mail-yw0-f185.google.com) (209.85.211.185) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2009 20:44:47 +0000 Received: by ywh15 with SMTP id 15so1397714ywh.5 for ; Fri, 06 Nov 2009 12:44:26 -0800 (PST) Received: by 10.101.154.22 with SMTP id g22mr1191822ano.50.1257540266271; Fri, 06 Nov 2009 12:44:26 -0800 (PST) Received: from ?10.71.1.166? ([12.236.188.2]) by mx.google.com with ESMTPS id 35sm170183yxh.33.2009.11.06.12.44.24 (version=SSLv3 cipher=RC4-MD5); Fri, 06 Nov 2009 12:44:25 -0800 (PST) Message-ID: <4AF48AA8.2030802@cloudera.com> Date: Fri, 06 Nov 2009 12:44:24 -0800 From: Amr Awadallah Organization: Cloudera, Inc. User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: Re: Time to build my own cluster - advice? References: <32120a6a0911050915l4b55ffat571778ce9d866ca5@mail.gmail.com> <4AF31393.3050506@gmail.com> <4AF31B6C.9040305@gmail.com> In-Reply-To: <4AF31B6C.9040305@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org yep, hadoop-ec2 launch-cluster amr-cluster 5 will launch a cluster of 5 nodes, after you setup environment variables for AWS credentials and a small config file describing which AMI/instance-type/zone to use, see: http://archive.cloudera.com/docs/_getting_started.html then hadoop-ec2 terminate-cluster amr-cluster will take the cluster down instantaneously (otherwise you will keep paying money even if nodes idle), but that means all the data you had in HDFS will be gone with the nodes, so you should save that data to S3/EBS or launch an EBS-bacsed cluster as described here: http://archive.cloudera.com/docs/_getting_started_and_basic_example_instructions.html -- amr Edmund Kohlwey wrote: > First of all, let me say I don't use EC2 - there's some people at my > company who do, but I've been fortunate enough to use our internal dev > cluster for all the work I've done, so this is total hearsay. > > That having been said, the people that I know who are using EC2 aren't > leaving the cluster running when not in use - there's scripts from (I > believe) Cloudera that can allocate and configure the right number of > nodes on EC2 with whatever AMI you specify, and then tear them down > when you're done. > > On 11/5/09 1:14 PM, Mark Kerzner wrote: >> Edmund, >> >> I wanted to install OpenOffice and connect to it from my java code. I >> tried >> to replicate the complete install by copying it, but there must be >> something >> else there, because I can't connect on Amazon MapReduce, but I can on >> my own >> cluster. >> >> When you say cheaper, do you mean that keeping your own EC2 machines >> up and >> using them as hadoop cluster is in the end cheaper than starting a >> Hadoop >> cluster every time you want to run a job? >> >> Thank you, >> Mark >> >> On Thu, Nov 5, 2009 at 12:04 PM, Edmund Kohlwey >> wrote: >> >> >>> If all your dependencies are java based (like Open Office) you might >>> try >>> using a dependency manager/build tool like maven or ant/ivy to >>> package the >>> dependencies in your jar. I'm not sure if any parts of open office are >>> available in a public repo as maven artifacts or not, or if you want >>> to get >>> into packaging artifacts for your build system, but its something >>> you might >>> try. >>> >>> I think its cheaper to just use EC2 anyways, so that might be a >>> motivating >>> factor for you as well. >>> >>> Hi, >>> >>>>> so far I've been using Amazon MapReduce. However, my app uses a >>>>> growing >>>>> number of Linux packages. I have been installing them on the fly, >>>>> in the >>>>> Mapper.configure(), but with OpenOffice this is hard, and I don't >>>>> get a >>>>> service connection even after local install. >>>>> >>>>> Therefore, my question is on the advice in creating my own Hadoop >>>>> cluster >>>>> out of EC2 machines. Are there instructions? How hard is it? What are >>>>> best >>>>> practices? >>>>> >>>>> Thank you, >>>>> Mark >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >