hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3234) hdfs-724 "breaks" TestHBaseTestingUtility multiClusters
Date Sun, 14 Nov 2010 23:58:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931926#action_12931926
] 

Jonathan Gray commented on HBASE-3234:
--------------------------------------

>From a new run, here is all of the activity of the block that causes the Abort...

{noformat}

2010-11-14 15:44:00,042 DEBUG [IPC Server handler 4 on 40196] namenode.FSDirectory(273): DIR*
FSDirectory.addFile: /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
with blk_4643386764144918409_1009 block is added to the in-memory file system
2010-11-14 15:44:00,042 INFO  [IPC Server handler 4 on 40196] namenode.FSNamesystem(1482):
BLOCK* NameSystem.allocateBlock: /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005.
blk_4643386764144918409_1009
2010-11-14 15:44:00,044 INFO  [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.DataXceiver(228): Receiving block blk_4643386764144918409_1009 src: /127.0.0.1:48990
dest: /127.0.0.1:48738
2010-11-14 15:44:00,044 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.FSDataset(1517): b=blk_4643386764144918409_1009, f=null
2010-11-14 15:44:00,044 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.FSDataset(1313): writeTo metafile is /var/users/jgray/hbase/trunk/target/test-data/5173adaa-a370-457d-8293-8f059f70bd31/dfs/data/data1/blocksBeingWritten/blk_4643386764144918409_1009.meta
of size 0
2010-11-14 15:44:00,045 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for local datanode
to finish write.
2010-11-14 15:44:00,045 DEBUG [DataStreamer for file /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$DataStreamer(2429): DataStreamer
block blk_4643386764144918409_1009 wrote packet seqno:0 size:339 offsetInBlock:0 lastPacketInBlock:false
2010-11-14 15:44:00,045 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block blk_4643386764144918409_1009 of
length 318 seqno 0 offsetInBlock 0 lastPacketInBlock false
2010-11-14 15:44:00,046 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded an ack: Replies for seqno
0 are SUCCESS
2010-11-14 15:44:00,046 DEBUG [ResponseProcessor for block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534):
DFSClient Replies for seqno 0 are SUCCESS
2010-11-14 15:44:00,046 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for local datanode
to finish write.
2010-11-14 15:44:31,551 DEBUG [DataStreamer for file /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$DataStreamer(2429): DataStreamer
block blk_4643386764144918409_1009 wrote packet seqno:-1 size:25 offsetInBlock:0 lastPacketInBlock:false
2010-11-14 15:44:31,551 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block blk_4643386764144918409_1009 of
length 4 seqno -1 offsetInBlock 0 lastPacketInBlock false
2010-11-14 15:44:31,551 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(636): Changing block file offset of block blk_4643386764144918409_1009
from 310 to 0 meta file offset to 7
2010-11-14 15:44:31,552 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(422): Receiving empty packet for block blk_4643386764144918409_1009
2010-11-14 15:44:31,552 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded an ack: Replies for seqno
-1 are SUCCESS
2010-11-14 15:44:31,552 DEBUG [ResponseProcessor for block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534):
DFSClient Replies for seqno -1 are
2010-11-14 15:44:31,552 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for local datanode
to finish write.
2010-11-14 15:44:31,867 DEBUG [DataStreamer for file /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$DataStreamer(2429): DataStreamer
block blk_4643386764144918409_1009 wrote packet seqno:1 size:804 offsetInBlock:0 lastPacketInBlock:false
2010-11-14 15:44:31,868 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block blk_4643386764144918409_1009 of
length 783 seqno 1 offsetInBlock 0 lastPacketInBlock false
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded an ack: Replies for seqno
1 are SUCCESS
2010-11-14 15:44:31,868 DEBUG [ResponseProcessor for block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534):
DFSClient Replies for seqno 0 are FAILED
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for local datanode
to finish write.
2010-11-14 15:44:31,868 WARN  [ResponseProcessor for block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2580):
DFSOutputStream ResponseProcessor exception  for block blk_4643386764144918409_1009java.io.IOException:
Bad response 1 for block blk_4643386764144918409_1009 from datanode 127.0.0.1:48738
2010-11-14 15:44:31,868 WARN  [DataStreamer for file /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream(2616): Error Recovery for
block blk_4643386764144918409_1009 bad datanode[0] 127.0.0.1:48738
2010-11-14 15:44:31,869 INFO  [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(565): Exception in receiveBlock for block blk_4643386764144918409_1009
java.io.IOException: Connection reset by peer
2010-11-14 15:44:31,869 INFO  [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(844):
PacketResponder blk_4643386764144918409_1009 0 : Thread is interrupted.
2010-11-14 15:44:31,869 INFO  [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(907):
PacketResponder 0 for block blk_4643386764144918409_1009 terminating
2010-11-14 15:44:31,869 INFO  [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.DataXceiver(377): writeBlock blk_4643386764144918409_1009 received exception java.io.IOException:
Connection reset by peer
2010-11-14 15:45:32,574 INFO  [IPC Server handler 8 on 40196] namenode.INodeFileUnderConstruction(212):
BLOCK* blk_4643386764144918409_1009 recovery started, primary=127.0.0.1:48738
2010-11-14 15:45:34,183 INFO  [org.apache.hadoop.hdfs.server.datanode.DataNode$1@26c94114]
datanode.DataNode(1632): NameNode calls recoverBlock(block=blk_4643386764144918409_1009, targets=[127.0.0.1:48738])
2010-11-14 15:45:34,183 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataNode$1@26c94114]
datanode.DataNode(1409): block=blk_4643386764144918409_1009
2010-11-14 15:45:34,184 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataNode$1@26c94114]
datanode.DataNode(1419): getBlockMetaDataInfo successful block=blk_4643386764144918409_1009
length 771 genstamp 1009
2010-11-14 15:45:34,185 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataNode$1@26c94114]
datanode.DataNode(1563): block=blk_4643386764144918409_1009, (length=771), syncList=[block:blk_4643386764144918409_1009
node:127.0.0.1:48738], closeFile=true
2010-11-14 15:45:34,186 INFO  [org.apache.hadoop.hdfs.server.datanode.DataNode$1@26c94114]
datanode.DataNode(1450): oldblock=blk_4643386764144918409_1009(length=771), newblock=blk_4643386764144918409_1014(length=771),
datanode=127.0.0.1:48738
2010-11-14 15:45:34,188 INFO  [IPC Server handler 1 on 40196] namenode.FSNamesystem(3149):
BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48738 is added to blk_4643386764144918409_1009
size 771
2010-11-14 15:45:34,189 INFO  [IPC Server handler 2 on 40196] namenode.FSNamesystem(1970):
commitBlockSynchronization(lastblock=blk_4643386764144918409_1009, newgenerationstamp=1014,
newlength=771, newtargets=[127.0.0.1:48738], closeFile=true, deleteBlock=false)
{noformat}

All of the recovery stuff happens after the abort.  The snippet around the exception/abort
from this log is this:

{noformat}
2010-11-14 15:44:31,867 DEBUG [PRI IPC Server handler 3 on 58498] hdfs.DFSClient$DFSOutputStream(3154):
DFSClient writeChunk allocating new packet seqno=1, src=/user/jgray/.logs/dev692.sf2p.facebook.com
,58498,1289778232413/10.17.83.207%3A58498.1289778233005, packetSize=65557, chunksPerPacket=127,
bytesCurBlock=0
2010-11-14 15:44:31,867 DEBUG [RegionServer:0;dev692.sf2p.facebook.com,58498,1289778232413.logSyncer]
hdfs.DFSClient$DFSOutputStream(3238): DFSClient flush() : saveOffset 512 bytesCurBlock 771
lastFlush
Offset 310
2010-11-14 15:44:31,867 DEBUG [DataStreamer for file /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$
DFSOutputStream$DataStreamer(2429): DataStreamer block blk_4643386764144918409_1009 wrote
packet seqno:1 size:804 offsetInBlock:0 lastPacketInBlock:false
2010-11-14 15:44:31,868 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block blk_4643386764144918409_1009 of
length 783 seqno 1
 offsetInBlock 0 lastPacketInBlock false
2010-11-14 15:44:31,868 DEBUG [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver$PacketResponder(737): PacketResponder 0 adding seqno 1 to ack queue.
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded a
n ack: Replies for seqno 1 are SUCCESS
2010-11-14 15:44:31,868 DEBUG [ResponseProcessor for block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534):
DFSClient Replies for seqno 0 are FAILED
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 
waiting for local datanode to finish write.
2010-11-14 15:44:31,868 WARN  [ResponseProcessor for block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2580):
DFSOutputStream ResponseProcessor exception  for block bl
k_4643386764144918409_1009java.io.IOException: Bad response 1 for block blk_4643386764144918409_1009
from datanode 127.0.0.1:48738
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2542)

2010-11-14 15:44:31,868 WARN  [DataStreamer for file /user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$
DFSOutputStream(2616): Error Recovery for block blk_4643386764144918409_1009 bad datanode[0]
127.0.0.1:48738
2010-11-14 15:44:31,869 INFO  [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.BlockReceiver(565): Exception in receiveBlock for block blk_4643386764144918409_1009
java.io.IOExcept
ion: Connection reset by peer
2010-11-14 15:44:31,869 INFO  [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(844):
PacketResponder blk_4643386764144918409_1009 0 : Thread is interrupt
ed.
2010-11-14 15:44:31,869 INFO  [PacketResponder 0 for Block blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(907):
PacketResponder 0 for block blk_4643386764144918409_1009 terminating
2010-11-14 15:44:31,869 INFO  [org.apache.hadoop.hdfs.server.datanode.DataXceiver@6a25b72a]
datanode.DataXceiver(377): writeBlock blk_4643386764144918409_1009 received exception java.io.IOException:
Con
nection reset by peer
2010-11-14 15:44:31,869 WARN  [RegionServer:0;dev692.sf2p.facebook.com,58498,1289778232413.logSyncer]
hdfs.DFSClient$DFSOutputStream(3293): Error while syncing
java.io.IOException: All datanodes 127.0.0.1:48738 are bad. Aborting...
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2666)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2157)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2356)
{noformat}

> hdfs-724 "breaks" TestHBaseTestingUtility multiClusters
> -------------------------------------------------------
>
>                 Key: HBASE-3234
>                 URL: https://issues.apache.org/jira/browse/HBASE-3234
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: org.apache.hadoop.hbase.TestHBaseTestingUtility-output.txt, org.apache.hadoop.hbase.TestHBaseTestingUtility.txt
>
>
> We upgraded our hadoop jar in TRUNK to latest on 0.20-append branch.  TestHBaseTestingUtility
started failing reliably.  If I back out hdfs-724, the test passes again.  This issue is about
figuring whats up here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message