hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Adrien <a...@jeanjean.ch>
Subject xceiverCount limit reason
Date Thu, 08 Jan 2009 11:03:46 GMT

Hello all,

I'm running HBase on top of hadoop and I have some difficulties to tune
hadoop conf in order to work fine with HBase.
My configuration is 4 desktop class machines, 2 are running a
datanode/region server, 1 only a region server and 1 a namenode/hbase
master, 1Gb RAM each

When I start HBase, about 300 regions must be load on 3 region servers; a
lot of accesses are made concurrently on Hadoop. My first problem, using the
default configuration, was to see too many of:
DataXceiver: java.net.SocketTimeoutException: 480000 millis timeout while
waiting for channel to be ready for write.

I was wondering what the reason of such a time out is. Where is the
bottleneck ? First I believed that was a network problem (I have  100Mbits/s
interfaces). But after monitoring the network, it seems the load is low when
it happens.
Anyway, I found the parameter
dfs.datanode.socket.write.timeout and I set it 0 to disable the timeout.

Then I saw in datanodes
xceiverCount 256 exceeds the limit of concurrent xcievers 255
What is exactly the role of the receivers ? to receive the replicated blocks
and/or to receive the file from clients ?
When their threads end ? When their threads are created ?

Anyway, I found the parameter
dfs.datanode.max.xcievers
I upped it to 511, then to 1023 and today to 2047; but by cluster is not so
big (300 HBase regions, 200Gb including replication factor of 2); I'm not
sure I will be able to up this limit for a long time. Moreover, it
considerably increases the amount of virtual memory needed for the datanode
jvm (about 2Gb now, only 500Mb for heap). That yields to excessive swap, and
a new problem arises; some leases expired, and my entire cluster eventually
fails.

Can I tune other parameter to avoid these concurrent receivers to be created
?
Upping the dfs.replication.interval for example could help ?

Could the fact the I run the regionserver on the same machine that the
datanode up the amount of xciever ? in which case I'll try a different
layout, and use the network bottleneck to avoid stress datanodes.

Any clue on the inside-hadoop-xciever would be appreciated.
Thanks.

-- Jean-Adrien
-- 
View this message in context: http://www.nabble.com/xceiverCount-limit-reason-tp21349807p21349807.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message