hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Miller <someb...@squareplanet.de>
Subject Re: decomission a node
Date Wed, 07 Jul 2010 07:00:47 GMT
Yes the effect of "scaling down" was the first thing I wanted to look at.
To process X GB it currently takes Y seconds with Z nodes.
If I process X GB with Z/2 nodes, does it take Y/2 seconds?
How about Z-1,Z-2,Z-3,.... nodes?

Right now my MR job process alot of small files (2000 files, @2.5MB each)
individually, so the next test would involve changing my MR job to combine
the small files into bigger pieces (closer to hdfs block size) and see 
if that
is more effective.

Each line of my small files has a timestamp column and 55 columns with
numerical data and my reducer needs to calc the column averages for
certain time periods (last day, last hour,etc.) based on the timestamp.

Alan

On 07/06/2010 08:06 PM, Allen Wittenauer wrote:
> On Jul 6, 2010, at 8:35 AM, Michael Segel wrote:
>    
>> I'm also not sure how dropping a node will test the scalability. You would be testing
resilience.
>>      
> He's testing scale down, not scale up (which is the way we normally think of things...
I was confused by the wording too).
>
> In other words, "if I drop a node, how much of a performance hit is my job going to take?"
>
> Also, for this type of timing/testing, I'd probably make sure speculative execution is
off.   It will likely throw some curve balls into the time.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message