hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
Date Wed, 21 Oct 2009 21:36:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768472#action_12768472
] 

stack commented on HDFS-720:
----------------------------

This might be cleaner example.  Its first exception in this DN's log after loading started.
  Its like we skip/lose a packet because of the NPE?

{code}
...
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 for block blk_-7356834145770439479_1586 responded my status  for seqno 895
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving one
packet for block blk_-7356834145770439479_1586 of length 65024 seqno 896 offsetInBlock 58025472
lastPacketInBlock false
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-7356834145770439479_1586 1 responded other status  for seqno 895
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 adding seqno 896 to ack queue.
2009-10-21 18:23:11,324 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 got seqno = 896
2009-10-21 18:23:11,325 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving one
packet for block blk_-7356834145770439479_1586 of length 65024 seqno 897 offsetInBlock 58090496
lastPacketInBlock false
2009-10-21 18:23:11,325 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 adding seqno 897 to ack queue.
2009-10-21 18:23:11,325 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-7356834145770439479_1586 1 Exception java.lang.NullPointerException
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
    at java.lang.Thread.run(Thread.java:619)

2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 got seqno = 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 seqno = 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 for block blk_-830254393316092139_1588 responded my status  for seqno 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-830254393316092139_1588 1 responded other status  for seqno 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 got seqno = 4547
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 seqno = 4547
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving one
packet for block blk_-7356834145770439479_1586 of length 65024 seqno 898 offsetInBlock 58155520
lastPacketInBlock false
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 for block blk_-830254393316092139_1588 responded my status  for seqno 4547
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-830254393316092139_1588 1 responded other status  for seqno 4547
2009-10-21 18:23:11,327 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 for block blk_-7356834145770439479_1586 responded my status  for seqno -2
2009-10-21 18:23:11,327 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-7356834145770439479_1586 1 responded other status  for seqno -2
2009-10-21 18:23:11,327 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 for block blk_-7356834145770439479_1586 terminating
...
{code}

We keep going with the download... then this on the finish up:

{code}
2009-10-21 18:23:11,338 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
1 for block blk_-830254393316092139_1588 responded my status  for seqno 4555
2009-10-21 18:23:11,338 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Number of active
connections is: 184
2009-10-21 18:23:11,338 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-830254393316092139_1588 1 responded other status  for seqno 4555
2009-10-21 18:23:11,339 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
receive buf size 131071 tcp no delay true
2009-10-21 18:23:11,339 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-7356834145770439479_1586 src: /XX,XX,XX.142:49468 dest: /XX,XX,XX.139:51010
2009-10-21 18:23:11,339 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Recover the
RBW replica blk_-7356834145770439479_1586
2009-10-21 18:23:11,339 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Recovering replica
ReplicaBeingWritten, blk_-7356834145770439479_1586, RBW
  getNumBytes()     = 58415616
  getBytesOnDisk()  = 58415616
  getVisibleLength()= 58025472
  getVolume()       = /d3/stack/dfs.data.dir/current/finalized
  getBlockFile()    = /d3/stack/dfs.data.dir/current/rbw/blk_-7356834145770439479
  bytesAcked=58025472
  bytesOnDisk=58415616
{code}



> NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
> ----------------------------------------------------------------
>
>                 Key: HDFS-720
>                 URL: https://issues.apache.org/jira/browse/HDFS-720
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.21.0
>         Environment: Current branch-0.21 of hdfs, mapreduce, and common.  Here is svn
info:
> URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 827883
> Node Kind: directory
> Schedule: normal
> Last Changed Author: szetszwo
> Last Changed Rev: 826906
> Last Changed Date: 2009-10-20 00:16:25 +0000 (Tue, 20 Oct 2009)
>            Reporter: stack
>         Attachments: dn.log
>
>
> Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010:
> {code}
> 2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: /XX.XX.XX.139:51010
> 2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_6345892463926159834_1029 1 Exception java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
>         at java.lang.Thread.run(Thread.java:619)
> {code}
> On XX,XX,XX.140 side, it looks like this:
> {code}
> 10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: /XX.XX.XX140:51010
> 2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
2 for block blk_6345892463926159834_1029 terminating
> 2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(XX.XX.XX.140:51010,
storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, ipcPort=51020):Exception
writing block blk_6345892463926159834_1029 to mirror XX.XX.XX.139:51010
> java.io.IOException: Connection reset by peer
>     at sun.nio.ch.FileDispatcher.write0(Native Method)
>     at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>     at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>     at sun.nio.ch.IOUtil.write(IOUtil.java:75)
>     at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>     at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
>     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>     at java.io.DataOutputStream.write(DataOutputStream.java:90)
>     at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466)
>     at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434)
>     at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573)
>     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352)
>     at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
>     at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
>     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
>     at java.lang.Thread.run(Thread.java:619)
> {code}
> Here is the bit of code inside the run method:
> {code}
>  922                   pkt = ackQueue.getFirst();
>  923                   expected = pkt.seqno;
> {code}
> So 'pkt' is null?  But LinkedList API says that it throws NoSuchElementException if list
is empty so you'd think we wouldn't get a NPE here.  What am I missing?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message