hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: decomission a node
Date Wed, 07 Jul 2010 13:55:43 GMT

This is pretty predictable.
Determine the average time it takes to process a m/r task.
If you can process 100 m/r tasks simultaneously, and then cut that to 50 m/r tasks you can
handle simultaneously, your job will take twice as long to run.

Granted this will give you a rough estimate of how long it will take your job to run.



> Date: Wed, 7 Jul 2010 09:00:47 +0200
> From: somebody@squareplanet.de
> To: common-user@hadoop.apache.org
> Subject: Re: decomission a node
> Yes the effect of "scaling down" was the first thing I wanted to look at.
> To process X GB it currently takes Y seconds with Z nodes.
> If I process X GB with Z/2 nodes, does it take Y/2 seconds?
> How about Z-1,Z-2,Z-3,.... nodes?
> Right now my MR job process alot of small files (2000 files, @2.5MB each)
> individually, so the next test would involve changing my MR job to combine
> the small files into bigger pieces (closer to hdfs block size) and see 
> if that
> is more effective.
> Each line of my small files has a timestamp column and 55 columns with
> numerical data and my reducer needs to calc the column averages for
> certain time periods (last day, last hour,etc.) based on the timestamp.
> Alan
> On 07/06/2010 08:06 PM, Allen Wittenauer wrote:
> > On Jul 6, 2010, at 8:35 AM, Michael Segel wrote:
> >    
> >> I'm also not sure how dropping a node will test the scalability. You would be
testing resilience.
> >>      
> > He's testing scale down, not scale up (which is the way we normally think of things...
I was confused by the wording too).
> >
> > In other words, "if I drop a node, how much of a performance hit is my job going
to take?"
> >
> > Also, for this type of timing/testing, I'd probably make sure speculative execution
is off.   It will likely throw some curve balls into the time.
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message