hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0
Date Thu, 13 Dec 2007 05:45:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551222
] 

Raghu Angadi commented on HADOOP-2341:
--------------------------------------


Sure. Please keep the Jira open if you think there is a bug.

pread I mentioned is FSInputStrea.read(long, buf, offset, len).

> Is there an issue to move the datanode to use select that you know of?

What does that fix? What is the problem it is causing now? It surely won't fix 900 (unaccounted)
connections. IMHO reducing threads on Datanode just fixes just one of the symptoms and does
not address the real issue of using streaming api to do sparse random reads.
 



> Datanode active connections never returns to 0
> ----------------------------------------------
>
>                 Key: HADOOP-2341
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2341
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Paul Saab
>         Attachments: dfsclient.patch, hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt,
stacks-YY.YY.YY.YY.txt
>
>
> On trunk i continue to see the following in my data node logs:
> 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 42
> 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 41
> 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 40
> 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 39
> 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 38
> 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 37
> 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 36
> 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 35
> 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 34
> 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections
is: 33
> This number never returns to 0, even after many hours of no new data being manipulated
or added into the DFS.
> Looking at netstat -tn i see significant amount of data in the send-q that never goes
away:
> tcp        0  34240 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:55792   ESTABLISHED

> tcp        0  38968 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:38169   ESTABLISHED

> tcp        0  38456 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:35456   ESTABLISHED

> tcp        0  29640 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:59845   ESTABLISHED

> tcp        0  50168 ::ffff:XX.XX.XX.XXX:50010   ::ffff:YY.YY.YY.YY:44584   ESTABLISHED

> When sniffing the network I see that the remote side (YY.YY.YY.YY) is returning a window
size of 0
> 16:11:41.760474 IP XX.XX.XX.XXX.50010 > YY.YY.YY.YY.44584: . ack 3339984123 win 46
<nop,nop,timestamp 1786247180 885681789>
> 16:11:41.761597 IP YY.YY.YY.YY.44584 > XX.XX.XX.XXX.50010: . ack 1 win 0 <nop,nop,timestamp
885801786 1775711351>
> Then we look at the stack traces on each datanode, I will have tons of threads that *never*
go away in the following trace:
> {code}
> Thread 6516 (org.apache.hadoop.dfs.DataNode$DataXceiver@166068b6):
>   State: RUNNABLE
>   Blocked count: 0
>   Waited count: 0
>   Stack:
>     java.net.SocketOutputStream.socketWrite0(Native Method)
>     java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>     java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>     java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>     java.io.DataOutputStream.write(DataOutputStream.java:90)
>     org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
>     org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
>     org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
>     org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
>     java.lang.Thread.run(Thread.java:619)
> {code}
> Unfortunately there's very little in the logs with exceptions that could point to this.
 I have some exceptions the following, but nothing that points to problems between XX and
YY:
> {code}
> 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to delete block
blk_4515246476002110310. Block not found in blockMap. 
> 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in deleting blocks.
>         at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750)
>         at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675)
>         at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569)
>         at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720)
>         at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message