Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 25846 invoked from network); 27 Dec 2010 19:04:28 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Dec 2010 19:04:28 -0000 Received: (qmail 90570 invoked by uid 500); 27 Dec 2010 19:04:24 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 90521 invoked by uid 500); 27 Dec 2010 19:04:24 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 90513 invoked by uid 99); 27 Dec 2010 19:04:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Dec 2010 19:04:24 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 209.85.210.176 is neither permitted nor denied by domain of james@tynt.com) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Dec 2010 19:04:19 +0000 Received: by iyb26 with SMTP id 26so9151752iyb.35 for ; Mon, 27 Dec 2010 11:03:58 -0800 (PST) Received: by 10.231.32.130 with SMTP id c2mr12606625ibd.35.1293476638147; Mon, 27 Dec 2010 11:03:58 -0800 (PST) References: From: James Seigel In-Reply-To: Mime-Version: 1.0 (iPhone Mail 8C148) Date: Mon, 27 Dec 2010 12:04:01 -0700 Message-ID: <-5254625518205172075@unknownmsgid> Subject: Re: Hadoop/Elastic MR on AWS To: "common-user@hadoop.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thank you for sharing. Sent from my mobile. Please excuse the typos. On 2010-12-27, at 11:18 AM, Sudhir Vallamkondu wrote: > We recently crossed this bridge and here are some insights. We did an > extensive study comparing costs and benchmarking local vs EMR for our > current needs and future trend. > > - Scalability you get with EMR is unmatched although you need to look at > your requirement and decide this is something you need. > > - When using EMR its cheaper to use reserved instances vs nodes on the fl= y. > You can always add more nodes when required. I suggest looking at your > current computing needs and reserve instances for a year or two and use > these to run EMR and add nodes at peak needs. In your cost estimation you > will need to factor in the data transfer time/costs unless you are dealin= g > with public datasets on S3 > > - EMR fared similar to local cluster on CPU benchmarks (we used MRBench t= o > benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO > benchmark). For IO intensive jobs you will need to add more nodes to > compensate this. > > - When compared to local cluster, you will need to factor the time it tak= es > for the EMR cluster to setup when starting a job. This like data transfer > time, cluster replication time etc > > - EMR API is very flexible however you will need to build a custom interf= ace > on top of it to suit your job management and monitoring needs > > - EMR bootstrap actions can satisfy most of your native lib needs so no > drawbacks there. > > > -- Sudhir > > > On 12/26/10 5:26 AM, "common-user-digest-help@hadoop.apache.org" > wrote: > >> From: Otis Gospodnetic >> Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST) >> To: >> Subject: Re: Hadoop/Elastic MR on AWS >> >> Hello Amandeep, >> >> >> >> ----- Original Message ---- >>> From: Amandeep Khurana >>> To: common-user@hadoop.apache.org >>> Sent: Fri, December 10, 2010 1:14:45 AM >>> Subject: Re: Hadoop/Elastic MR on AWS >>> >>> Mark, >>> >>> Using EMR makes it very easy to start a cluster and add/reduce capacit= y as >>> and when required. There are certain optimizations that make EMR an >>> attractive choice as compared to building your own cluster out. Using = EMR >> >> >> Could you please point out what optimizations you are referring to? >> >> Thanks, >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBa= se >> Hadoop ecosystem search :: http://search-hadoop.com/ >> >>> also ensures you are using a production quality, stable system backed b= y the >>> EMR engineers. You can always use bootstrap actions to put your own tw= eaked >>> version of Hadoop in there if you want to do that. >>> >>> Also, you don't have to tear down your cluster after every job. You ca= n set >>> the alive option when you start your cluster and it will stay there ev= en >>> after your Hadoop job completes. >>> >>> If you face any issues with EMR, send me a mail offline and I'll be ha= ppy to >>> help. >>> >>> -Amandeep >>> >>> >>> On Thu, Dec 9, 2010 at 9:47 PM, Mark wrot= e: >>> >>>> Does anyone have any thoughts/experiences on running Hadoop in AWS? W= hat >>>> are some pros/cons? >>>> >>>> Are there any good AMI's out there for this? >>>> >>>> Thanks for any advice. >>>> >>> > > > iCrossing Privileged and Confidential Information > This email message is for the sole use of the intended recipient(s) and m= ay contain confidential and privileged information of iCrossing. Any unauth= orized review, use, disclosure or distribution is prohibited. If you are no= t the intended recipient, please contact the sender by reply email and dest= roy all copies of the original message. > >