Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <4B614AF2.2040401@apache.org>
Date: Thu, 28 Jan 2010 00:29:38 -0800
From: Patrick Hunt <phunt@apache.org>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: general@hadoop.apache.org
CC: hbase-user@hadoop.apache.org
Subject: Re: Performance of EC2
References: <1eabbac31001260847g3a726f62o3a89ee2d74096e6e@mail.gmail.com>
	 <31a243e71001261036t7d47bb7fo387e079cc8ed0974@mail.gmail.com>
	 <1eabbac31001261120m74ee4817x3751592c930fd98b@mail.gmail.com>
	 <4B5F4610.8050803@apache.org>
 <1eabbac31001261249s4b09fbdfyb886927b36c2953e@mail.gmail.com>
In-Reply-To: <1eabbac31001261249s4b09fbdfyb886927b36c2953e@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

FYI, just noticed this one:

Rackspace Cloud Servers versus Amazon EC2: Performance Analysis 
http://bit.ly/bkG1AB

Patrick

Something Something wrote:
> Wow.. how naive I am to think that I could trust Amazon.  Thanks for
> forwarding the links, Patrick.  Seems like Amazon's reliability has gone
> down considerably over the past few months.  (Occasionally my instances fail
> on startup or die in the middle for no apparent reason, and I used to think
> I was doing something dumb!)
> 
> But what I don't understand is this... if I *reserve* an instance then I
> wouldn't be sharing its CPU with anyone, right?  The blog seems to indicate
> otherwise.
> 
> I guess, I will have to look for alternatives to Amazon EC2.  Any one has
> any recommendations?  Thanks again.
> 
> 
> On Tue, Jan 26, 2010 at 11:44 AM, Patrick Hunt <phunt@apache.org> wrote:
> 
>> Re "Amazon predictability", did you guys see this recent paper:
>> http://people.csail.mit.edu/tromer/cloudsec/
>>
>> Also some addl background on "noisy neighbor effects":
>> http://bit.ly/4O7dHx
>> http://bit.ly/8zPvQd
>>
>> Some interesting bits of information in there.
>>
>> Patrick
>>
>>
>> Something Something wrote:
>>
>>> Here are some of the answers:
>>>
>>>   How many concurrent reducers run on each node?  Default two?
>>>> I was assuming 2 on each node would be the default.  If not, this could
>>> be a
>>> problem.  Please let me know.
>>>
>>>  'd suggest you spend a bit of time figuring where your MR jobs
>>>> are spending their time?
>>> I agree.  Will do some more research :)
>>>
>>>  How much of this overall time is spent in reduce phase?
>>>> Mostly time is spent in the Reduce phases, because that's where most of
>>> the
>>> critical code is.
>>>
>>>  Are inserts to a new table?
>>>> Yes, all inserts will always be in a new table.  In fact, I disable/drop
>>> HTables during this process.  Not using any special indexes, should I be?
>>>
>>>  I'm a little surprised that all worked on the small instances, that your
>>>> jobs completed.
>>> But, really, shouldn't Amazon guarantee predictability :)  After all I am
>>> paying for these instances.. albeit a small amount!
>>>
>>>  Are you opening a new table inside each task or once up in the config?
>>>> I open HTable in the 'setup' method for each mapper/reducer, and close
>>> table
>>> in the 'cleanup' method.
>>>
>>>  You have to temper the above general rule with the fact that...
>>>> I will try a few combinations.
>>>  How big is your dataset?
>>>> This one in particular is not big, but the real production ones will be.
>>>  Here's approximately how many rows get processed:
>>> Phase 1:  300 rows
>>> Phase 2 thru 8:  100 rows.
>>> (Note:  Each phase does complex calculations on the row.)
>>>
>>> Thanks for the help.
>>>
>>>
>>> On Tue, Jan 26, 2010 at 10:36 AM, Jean-Daniel Cryans <jdcryans@apache.org
>>>> wrote:
>>>  How big is your dataset?
>>>> J-D
>>>>
>>>> On Tue, Jan 26, 2010 at 8:47 AM, Something Something
>>>> <mailinglists19@gmail.com> wrote:
>>>>
>>>>> I have noticed some strange performance numbers on EC2.  If someone can
>>>>>
>>>> give
>>>>
>>>>> me some hints to improve performance that would be greatly appreciated.
>>>>>  Here are the details:
>>>>>
>>>>> I have a process that runs a series of Jobs under Hadoop 0.20.1 & Hbase
>>>>> 0.20.2  I ran the *exact* same process with following configurations:
>>>>>
>>>>> 1) 1 Master & 4 Workers (*c1.xlarge* instances) & 1 Zookeeper
>>>>>
>>>> (*c1.medium*)
>>>>
>>>>> with *8 Reducers *for every Reduce task.  The process completed in *849*
>>>>>  seconds.
>>>>>
>>>>> 2) 1 Master, 4 Workers & 1 Zookeeper  *ALL m1.small* instances with *8
>>>>> Reducers *for every Reduce task.  The process completed in *906*
>>>>> seconds.
>>>>>
>>>>> 3) 1 Master, *11* Workers & *3* Zookeepers  *ALL m1.small* instances
>>>>> with
>>>>>
>>>> *20
>>>>
>>>>> Reducers *for every Reduce task.  The process completed in *984*
>>>>> seconds!
>>>>>
>>>>>
>>>>> Two main questions:
>>>>>
>>>>> 1)  It's totally surprising that when I have 11 workers with 20 Reducers
>>>>>
>>>> it
>>>>
>>>>> runs slower than when I have exactly same type of fewer machines with
>>>>>
>>>> fewer
>>>>
>>>>> reducers..
>>>>> 2)  As expected it runs faster on c1.xlarge, but the performance
>>>>>
>>>> improvement
>>>>
>>>>> doesn't justify the high cost difference.  I must not be utilizing the
>>>>> machine power, but I don't know how to do that.
>>>>>
>>>>> Here are some of the performance improvements tricks that I have learnt
>>>>>
>>>> from
>>>>
>>>>> this mailing list in the past that I am using:
>>>>>
>>>>> 1)  conf.set("hbase.client.scanner.caching", "30");   I have this for
>>>>> all
>>>>> jobs.
>>>>>
>>>>> 2)  Using the following code every time I open a HTable:
>>>>>       this.table = new HTable(new HBaseConfiguration(), "tablenameXYZ");
>>>>>       table.setAutoFlush(false);
>>>>>       table.setWriteBufferSize(1024 * 1024 * 12);
>>>>>
>>>>> 3)  For every Put I do this:
>>>>>         Put put = new Put(Bytes.toBytes(out));
>>>>>         put.setWriteToWAL(false);
>>>>>
>>>>> 4)  Change the No. of Reducers as per the No. of Workers.  I believe the
>>>>> formula is:  # of workers * 1.75.
>>>>>
>>>>> Any other hints?  As always, greatly appreciate the help.  Thanks.
>>>>>
>>>>>
>