I think the reserved EC2 instances just give you a better deal price-wise in exchange for an
advanced payment and, essentially, a contract. I didn't see any mentions of reserved instances
mean no sharing. If AWS did that, they'd be nothing more than a regular hosting service.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
----- Original Message ----
> From: Something Something <mailinglists19@gmail.com>
> To: general@hadoop.apache.org
> Cc: hbase-user@hadoop.apache.org
> Sent: Tue, January 26, 2010 3:49:31 PM
> Subject: Re: Performance of EC2
>
> Wow.. how naive I am to think that I could trust Amazon. Thanks for
> forwarding the links, Patrick. Seems like Amazon's reliability has gone
> down considerably over the past few months. (Occasionally my instances fail
> on startup or die in the middle for no apparent reason, and I used to think
> I was doing something dumb!)
>
> But what I don't understand is this... if I *reserve* an instance then I
> wouldn't be sharing its CPU with anyone, right? The blog seems to indicate
> otherwise.
>
> I guess, I will have to look for alternatives to Amazon EC2. Any one has
> any recommendations? Thanks again.
>
>
> On Tue, Jan 26, 2010 at 11:44 AM, Patrick Hunt wrote:
>
> > Re "Amazon predictability", did you guys see this recent paper:
> > http://people.csail.mit.edu/tromer/cloudsec/
> >
> > Also some addl background on "noisy neighbor effects":
> > http://bit.ly/4O7dHx
> > http://bit.ly/8zPvQd
> >
> > Some interesting bits of information in there.
> >
> > Patrick
> >
> >
> > Something Something wrote:
> >
> >> Here are some of the answers:
> >>
> >> How many concurrent reducers run on each node? Default two?
> >>>>
> >>> I was assuming 2 on each node would be the default. If not, this could
> >> be a
> >> problem. Please let me know.
> >>
> >> 'd suggest you spend a bit of time figuring where your MR jobs
> >>>>
> >>> are spending their time?
> >> I agree. Will do some more research :)
> >>
> >> How much of this overall time is spent in reduce phase?
> >>>>
> >>> Mostly time is spent in the Reduce phases, because that's where most of
> >> the
> >> critical code is.
> >>
> >> Are inserts to a new table?
> >>>>
> >>> Yes, all inserts will always be in a new table. In fact, I disable/drop
> >> HTables during this process. Not using any special indexes, should I be?
> >>
> >> I'm a little surprised that all worked on the small instances, that your
> >>>>
> >>> jobs completed.
> >> But, really, shouldn't Amazon guarantee predictability :) After all I am
> >> paying for these instances.. albeit a small amount!
> >>
> >> Are you opening a new table inside each task or once up in the config?
> >>>>
> >>> I open HTable in the 'setup' method for each mapper/reducer, and close
> >> table
> >> in the 'cleanup' method.
> >>
> >> You have to temper the above general rule with the fact that...
> >>>>
> >>> I will try a few combinations.
> >>
> >> How big is your dataset?
> >>>>
> >>> This one in particular is not big, but the real production ones will be.
> >> Here's approximately how many rows get processed:
> >> Phase 1: 300 rows
> >> Phase 2 thru 8: 100 rows.
> >> (Note: Each phase does complex calculations on the row.)
> >>
> >> Thanks for the help.
> >>
> >>
> >> On Tue, Jan 26, 2010 at 10:36 AM, Jean-Daniel Cryans
> >> >wrote:
> >>
> >> How big is your dataset?
> >>>
> >>> J-D
> >>>
> >>> On Tue, Jan 26, 2010 at 8:47 AM, Something Something
> >>> wrote:
> >>>
> >>>> I have noticed some strange performance numbers on EC2. If someone
can
> >>>>
> >>> give
> >>>
> >>>> me some hints to improve performance that would be greatly appreciated.
> >>>> Here are the details:
> >>>>
> >>>> I have a process that runs a series of Jobs under Hadoop 0.20.1 &
Hbase
> >>>> 0.20.2 I ran the *exact* same process with following configurations:
> >>>>
> >>>> 1) 1 Master & 4 Workers (*c1.xlarge* instances) & 1 Zookeeper
> >>>>
> >>> (*c1.medium*)
> >>>
> >>>> with *8 Reducers *for every Reduce task. The process completed in *849*
> >>>> seconds.
> >>>>
> >>>> 2) 1 Master, 4 Workers & 1 Zookeeper *ALL m1.small* instances with
*8
> >>>> Reducers *for every Reduce task. The process completed in *906*
> >>>> seconds.
> >>>>
> >>>> 3) 1 Master, *11* Workers & *3* Zookeepers *ALL m1.small* instances
> >>>> with
> >>>>
> >>> *20
> >>>
> >>>> Reducers *for every Reduce task. The process completed in *984*
> >>>> seconds!
> >>>>
> >>>>
> >>>> Two main questions:
> >>>>
> >>>> 1) It's totally surprising that when I have 11 workers with 20 Reducers
> >>>>
> >>> it
> >>>
> >>>> runs slower than when I have exactly same type of fewer machines with
> >>>>
> >>> fewer
> >>>
> >>>> reducers..
> >>>> 2) As expected it runs faster on c1.xlarge, but the performance
> >>>>
> >>> improvement
> >>>
> >>>> doesn't justify the high cost difference. I must not be utilizing the
> >>>> machine power, but I don't know how to do that.
> >>>>
> >>>> Here are some of the performance improvements tricks that I have learnt
> >>>>
> >>> from
> >>>
> >>>> this mailing list in the past that I am using:
> >>>>
> >>>> 1) conf.set("hbase.client.scanner.caching", "30"); I have this for
> >>>> all
> >>>> jobs.
> >>>>
> >>>> 2) Using the following code every time I open a HTable:
> >>>> this.table = new HTable(new HBaseConfiguration(), "tablenameXYZ");
> >>>> table.setAutoFlush(false);
> >>>> table.setWriteBufferSize(1024 * 1024 * 12);
> >>>>
> >>>> 3) For every Put I do this:
> >>>> Put put = new Put(Bytes.toBytes(out));
> >>>> put.setWriteToWAL(false);
> >>>>
> >>>> 4) Change the No. of Reducers as per the No. of Workers. I believe
the
> >>>> formula is: # of workers * 1.75.
> >>>>
> >>>> Any other hints? As always, greatly appreciate the help. Thanks.
> >>>>
> >>>>
> >>
|