Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 26231 invoked from network); 28 Jan 2010 08:30:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Jan 2010 08:30:46 -0000 Received: (qmail 34800 invoked by uid 500); 28 Jan 2010 08:30:44 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 34709 invoked by uid 500); 28 Jan 2010 08:30:44 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 34563 invoked by uid 99); 28 Jan 2010 08:30:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jan 2010 08:30:44 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jan 2010 08:30:35 +0000 Received: from [10.72.168.35] (snvvpn4-10-72-168-c35.hq.corp.yahoo.com [10.72.168.35]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o0S8Tc0S012795; Thu, 28 Jan 2010 00:29:38 -0800 (PST) Message-ID: <4B614AF2.2040401@apache.org> Date: Thu, 28 Jan 2010 00:29:38 -0800 From: Patrick Hunt User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: general@hadoop.apache.org CC: hbase-user@hadoop.apache.org Subject: Re: Performance of EC2 References: <1eabbac31001260847g3a726f62o3a89ee2d74096e6e@mail.gmail.com> <31a243e71001261036t7d47bb7fo387e079cc8ed0974@mail.gmail.com> <1eabbac31001261120m74ee4817x3751592c930fd98b@mail.gmail.com> <4B5F4610.8050803@apache.org> <1eabbac31001261249s4b09fbdfyb886927b36c2953e@mail.gmail.com> In-Reply-To: <1eabbac31001261249s4b09fbdfyb886927b36c2953e@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit FYI, just noticed this one: Rackspace Cloud Servers versus Amazon EC2: Performance Analysis http://bit.ly/bkG1AB Patrick Something Something wrote: > Wow.. how naive I am to think that I could trust Amazon. Thanks for > forwarding the links, Patrick. Seems like Amazon's reliability has gone > down considerably over the past few months. (Occasionally my instances fail > on startup or die in the middle for no apparent reason, and I used to think > I was doing something dumb!) > > But what I don't understand is this... if I *reserve* an instance then I > wouldn't be sharing its CPU with anyone, right? The blog seems to indicate > otherwise. > > I guess, I will have to look for alternatives to Amazon EC2. Any one has > any recommendations? Thanks again. > > > On Tue, Jan 26, 2010 at 11:44 AM, Patrick Hunt wrote: > >> Re "Amazon predictability", did you guys see this recent paper: >> http://people.csail.mit.edu/tromer/cloudsec/ >> >> Also some addl background on "noisy neighbor effects": >> http://bit.ly/4O7dHx >> http://bit.ly/8zPvQd >> >> Some interesting bits of information in there. >> >> Patrick >> >> >> Something Something wrote: >> >>> Here are some of the answers: >>> >>> How many concurrent reducers run on each node? Default two? >>>> I was assuming 2 on each node would be the default. If not, this could >>> be a >>> problem. Please let me know. >>> >>> 'd suggest you spend a bit of time figuring where your MR jobs >>>> are spending their time? >>> I agree. Will do some more research :) >>> >>> How much of this overall time is spent in reduce phase? >>>> Mostly time is spent in the Reduce phases, because that's where most of >>> the >>> critical code is. >>> >>> Are inserts to a new table? >>>> Yes, all inserts will always be in a new table. In fact, I disable/drop >>> HTables during this process. Not using any special indexes, should I be? >>> >>> I'm a little surprised that all worked on the small instances, that your >>>> jobs completed. >>> But, really, shouldn't Amazon guarantee predictability :) After all I am >>> paying for these instances.. albeit a small amount! >>> >>> Are you opening a new table inside each task or once up in the config? >>>> I open HTable in the 'setup' method for each mapper/reducer, and close >>> table >>> in the 'cleanup' method. >>> >>> You have to temper the above general rule with the fact that... >>>> I will try a few combinations. >>> How big is your dataset? >>>> This one in particular is not big, but the real production ones will be. >>> Here's approximately how many rows get processed: >>> Phase 1: 300 rows >>> Phase 2 thru 8: 100 rows. >>> (Note: Each phase does complex calculations on the row.) >>> >>> Thanks for the help. >>> >>> >>> On Tue, Jan 26, 2010 at 10:36 AM, Jean-Daniel Cryans >>> wrote: >>> How big is your dataset? >>>> J-D >>>> >>>> On Tue, Jan 26, 2010 at 8:47 AM, Something Something >>>> wrote: >>>> >>>>> I have noticed some strange performance numbers on EC2. If someone can >>>>> >>>> give >>>> >>>>> me some hints to improve performance that would be greatly appreciated. >>>>> Here are the details: >>>>> >>>>> I have a process that runs a series of Jobs under Hadoop 0.20.1 & Hbase >>>>> 0.20.2 I ran the *exact* same process with following configurations: >>>>> >>>>> 1) 1 Master & 4 Workers (*c1.xlarge* instances) & 1 Zookeeper >>>>> >>>> (*c1.medium*) >>>> >>>>> with *8 Reducers *for every Reduce task. The process completed in *849* >>>>> seconds. >>>>> >>>>> 2) 1 Master, 4 Workers & 1 Zookeeper *ALL m1.small* instances with *8 >>>>> Reducers *for every Reduce task. The process completed in *906* >>>>> seconds. >>>>> >>>>> 3) 1 Master, *11* Workers & *3* Zookeepers *ALL m1.small* instances >>>>> with >>>>> >>>> *20 >>>> >>>>> Reducers *for every Reduce task. The process completed in *984* >>>>> seconds! >>>>> >>>>> >>>>> Two main questions: >>>>> >>>>> 1) It's totally surprising that when I have 11 workers with 20 Reducers >>>>> >>>> it >>>> >>>>> runs slower than when I have exactly same type of fewer machines with >>>>> >>>> fewer >>>> >>>>> reducers.. >>>>> 2) As expected it runs faster on c1.xlarge, but the performance >>>>> >>>> improvement >>>> >>>>> doesn't justify the high cost difference. I must not be utilizing the >>>>> machine power, but I don't know how to do that. >>>>> >>>>> Here are some of the performance improvements tricks that I have learnt >>>>> >>>> from >>>> >>>>> this mailing list in the past that I am using: >>>>> >>>>> 1) conf.set("hbase.client.scanner.caching", "30"); I have this for >>>>> all >>>>> jobs. >>>>> >>>>> 2) Using the following code every time I open a HTable: >>>>> this.table = new HTable(new HBaseConfiguration(), "tablenameXYZ"); >>>>> table.setAutoFlush(false); >>>>> table.setWriteBufferSize(1024 * 1024 * 12); >>>>> >>>>> 3) For every Put I do this: >>>>> Put put = new Put(Bytes.toBytes(out)); >>>>> put.setWriteToWAL(false); >>>>> >>>>> 4) Change the No. of Reducers as per the No. of Workers. I believe the >>>>> formula is: # of workers * 1.75. >>>>> >>>>> Any other hints? As always, greatly appreciate the help. Thanks. >>>>> >>>>> >