hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Martin <m...@andremartin.de>
Subject Re: Performance / cluster scaling question
Date Mon, 24 Mar 2008 13:31:51 GMT
Thanks for the clarification, dhruba :-)
Anyway, what can cause those other exceptions such as  "Could not get 
block locations" and "DataXceiver: java.io.EOFException"? Can anyone 
give me a little more insight about those exceptions?
And does anyone have a similar workload (frequent writes and deletion of 
small files), and what could cause the performance degradation (see 
first post)?  I think HDFS should be able to handle two million and more 
files/blocks...
Also, I observed that some of my datanodes do not "heartbeat" to the 
namenode for several seconds (up to 400 :-() from time to time - when I 
check those specific datanodes and do a "top", I see the "du" command 
running that seems to got stuck?!?
Thanks and Happy Easter :-)

Cu on the 'net,
                        Bye - bye,

                                   <<<<< André <<<< >>>>
èrbnA >>>>>

dhruba Borthakur wrote:

> The namenode lazily instructs a Datanode to delete blocks. As a response to every heartbeat
from a Datanode, the Namenode instructs it to delete a maximum on 100 blocks. Typically, the
heartbeat periodicity is 3 seconds. The heartbeat thread in the Datanode deletes the block
files synchronously before it can send the next heartbeat. That's the reason a small number
(like 100) was chosen.
>
> If you have 8 datanodes, your system will probably delete about 800 blocks every 3 seconds.
>
> Thanks,
> dhruba
>
> -----Original Message-----
> From: André Martin [mailto:mail@andremartin.de] 
> Sent: Friday, March 21, 2008 3:06 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Performance / cluster scaling question
>
> After waiting a few hours (without having any load), the block number 
> and "DFS Used" space seems to go down...
> My question is: is the hardware simply too weak/slow to send the block 
> deletion request to the datanodes in a timely manner, or do simply those 
> "crappy" HDDs cause the delay, since I noticed that I can take up to 40 
> minutes when deleting ~400.000 files at once manually using "rm -r"...
> Actually - my main concern is why the performance à la the throughput 
> goes down - any ideas?


Mime
View raw message