hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Adrien <a...@jeanjean.ch>
Subject Re: xceiverCount limit reason
Date Thu, 08 Jan 2009 14:29:23 GMT

Some more information about the case.

I read the HADOOP-3633 / 3859 / 3831 in jira.
I run the version 18.1 of hadoop therefore I have no fix for 3831.
Nevertheless my problem seems different.
The threads are created as soon the client (HBase) requests data. the data
arrives to HBase without problem but the thread never ends. Looking at the #
of threads graphs:

http://www.nabble.com/file/p21352818/launch_tests.png 
(you might need to go to nabble to see the image: 
http://www.nabble.com/xceiverCount-limit-reason-tp21349807p21349807.html

In the graph one runs hadoop / HBase 3 times (A/B/C) :
A:
I configure hadoop with dfs.datanode.max.xcievers=2023 and
dfs.datanode.socket.write.timeout=0
as soon I start hbase, the region load their data from dfs and the number of
threads climbs up to 1100 in about 2-3 min. Then it stays in this scope.
All DataXceiver threads are in one of these two states:

"org.apache.hadoop.dfs.DataNode$DataXceiver@6a2f81" daemon prio=10
tid=0x08289c00 nid=0x6bb6 runnable [0x8f980000..0x8f981140]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x95838858> (a sun.nio.ch.Util$1)
        - locked <0x95838868> (a java.util.Collections$UnmodifiableSet)
        - locked <0x95838818> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        - locked <0x95838b90> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readShort(DataInputStream.java:295)
        at
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1115)
        at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1037)
        at java.lang.Thread.run(Thread.java:619)

"org.apache.hadoop.dfs.DataNode$DataXceiver@1abf87e" daemon prio=10
tid=0x90bbd400 nid=0x61ae runnable [0x7b68a000..0x7b68afc0]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        - locked <0x9671a8e0> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readShort(DataInputStream.java:295)
        at
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1115)
        at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1037)
        at java.lang.Thread.run(Thread.java:619)


B:
I changed hadoop configuration, introducing the default 8min timeout.
Once again, as soon HBase gets data from dfs, the number of thread grows to
1100. After 8 minutes the timeout fires, and they fail one after each other
with the exception:

2009-01-08 14:21:09,305 WARN org.apache.hadoop.dfs.DataNode:
DatanodeRegistration(192.168.1.13:50010,
storageID=DS-1681396969-127.0.1.1-50010-1227536709605, infoPort=50075,
ipcPor
t=50020):Got exception while serving blk_-1718199459793984230_722338 to
/192.168.1.13:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.13:50010 re
mote=/192.168.1.13:37462]
        at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
        at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1873)
        at
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1967)
        at
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1109)
        at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1037)
        at java.lang.Thread.run(Thread.java:619)

C:
During this third session, I made the same run, but before the timeout
fires, I stop HBase. In this case, the thread ends correctly.

Is it the responsibility of hadoop client too manage its connection pool
with the server ? In which case the problem would be an HBase problem?
Anyway I found my problem, it is not a matter of performances.

Thanks for your answers
Have a nice day.

-- Jean-Adrien 
-- 
View this message in context: http://www.nabble.com/xceiverCount-limit-reason-tp21349807p21352818.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message