hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: DFS get blocked when writing a file.
Date Fri, 28 Mar 2008 14:42:01 GMT

 > "Exception in receiveBlock for block  java.io.IOException: Trying to
 > change block file offset of block blk_7857709233639057851 to 33357824
 > but actual size of file is 33353728"

This was fixed in HADOOP-3033. You can try running latest 0.16 branch 
(svn...hadoop/core/branches/branch-016). 0.16.2 release is scheduled for 
early next week.

This exception does not fully explain blocked client. If the client 
blocks again with latest 0.16 branch, could you include stacktraces on 
datanodes also? You could file a jira so that it is convenient to attach 
logs and stacktrace.

Raghu.

Iván de Prado wrote:
> Hello, 
> 
> I'm working with Hadoop 0.16.1. I have an issue with the DFS. Sometimes
> when writing to the HDFS it gets blocked. Sometimes it doesn't happen,
> so it's not easily reproducible. 
> 
> My cluster have 4 nodes and one master with the NameNode and JobTracker.
> This are the logs that appears when all gets blocked. Look to the block
> blk_7857709233639057851 that seems to be the problematic one. It raises
> the exception:
> 
> "Exception in receiveBlock for block  java.io.IOException: Trying to
> change block file offset of block blk_7857709233639057851 to 33357824
> but actual size of file is 33353728"
> 
> A bigger trace of the logs and a part of the stack trace:
> 
> hn3: 2008-03-28 07:34:44,499 INFO org.apache.hadoop.dfs.DataNode:
> Receiving block blk_7857709233639057851 src: /172.16.3.2:46092
> dest: /172.16.3.2:50010
> hn3: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
> Datanode 2 got response for connect ack  from downstream datanode with
> firstbadlink as 
> hn3: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
> Datanode 2 forwarding connect ack to upstream firstbadlink is 
> hn2: 2008-03-28 07:34:44,496 INFO org.apache.hadoop.dfs.DataNode:
> Received block blk_8152094109584962620 of size 67108864 from /172.16.3.2
> hn2: 2008-03-28 07:34:44,496 INFO org.apache.hadoop.dfs.DataNode:
> PacketResponder 2 for block blk_8152094109584962620 terminating
> hn2: 2008-03-28 07:34:44,500 INFO org.apache.hadoop.dfs.DataNode:
> Receiving block blk_7857709233639057851 src: /172.16.3.5:35904
> dest: /172.16.3.5:50010
> hn2: 2008-03-28 07:34:44,502 INFO org.apache.hadoop.dfs.DataNode:
> Datanode 1 got response for connect ack  from downstream datanode with
> firstbadlink as 
> hn2: 2008-03-28 07:34:44,502 INFO org.apache.hadoop.dfs.DataNode:
> Datanode 1 forwarding connect ack to upstream firstbadlink is 
> hn1: 2008-03-28 07:34:44,495 INFO org.apache.hadoop.dfs.DataNode:
> Received block blk_8152094109584962620 of size 67108864 from /172.16.3.4
> hn1: 2008-03-28 07:34:44,495 INFO org.apache.hadoop.dfs.DataNode:
> PacketResponder 1 for block blk_8152094109584962620 terminating
> hn4: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
> Receiving block blk_7857709233639057851 src: /172.16.3.4:36887
> dest: /172.16.3.4:50010
> hn4: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
> Datanode 0 forwarding connect ack to upstream firstbadlink is 
> hn4: 2008-03-28 07:34:44,615 INFO org.apache.hadoop.dfs.DataNode:
> Changing block file offset of block blk_7857709233639057851 from 4325376
> to 4325376 meta file offset to 33799
> hn3: 2008-03-28 07:34:45,304 INFO org.apache.hadoop.dfs.DataNode:
> Changing block file offset of block blk_7857709233639057851 from
> 33353728 to 33357824 meta file offset to 260615
> hn3: 2008-03-28 07:34:45,305 INFO org.apache.hadoop.dfs.DataNode:
> Exception in receiveBlock for block  java.io.IOException: Trying to
> change block file offset of block blk_7857709233639057851 to 33357824
> but actual size of file is 33353728
> hn1: 2008-03-28 07:35:31,835 INFO org.apache.hadoop.dfs.DataNode:
> BlockReport of 564 blocks got processed in 128 msecs
> 
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (10.0-b19 mixed
> mode):
> 
> "ResponseProcessor for block blk_7857709233639057851" prio=10
> tid=0x000000005c557800 nid=0x23ad waiting for monitor entry
> [0x0000000040e15000..0x0000000040e15a10]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
> $ResponseProcessor.run(DFSClient.java:1771)
>         - waiting to lock <0x00002aaab43ad910> (a java.util.LinkedList)
> 
> "DataStreamer for file /user/properazzi/test/output/index/_0.cfs block
> blk_7857709233639057851" prio=10 tid=0x000000005c59f000 nid=0x2392
> runnable [0x0000000041219000..0x0000000041219d10]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>         - locked <0x00002aaade9b8120> (a java.io.BufferedOutputStream)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         - locked <0x00002aaade9b8148> (a java.io.DataOutputStream)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
> $DataStreamer.run(DFSClient.java:1623)
>         - locked <0x00002aaab43ad910> (a java.util.LinkedList)
> 
> "org.apache.hadoop.dfs.DFSClient$LeaseChecker@144aa0ce" daemon prio=10
> tid=0x000000005c7f1000 nid=0x2254 waiting on condition
> [0x0000000041118000..0x0000000041118a90]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.dfs.DFSClient
> $LeaseChecker.run(DFSClient.java:597)
>         at java.lang.Thread.run(Thread.java:619)
> 
> "org.apache.hadoop.dfs.DFSClient$LeaseChecker@2d58f9d3" daemon prio=10
> tid=0x000000005c4fec00 nid=0x224f waiting on condition
> [0x0000000040f16000..0x0000000040f16c90]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.dfs.DFSClient
> $LeaseChecker.run(DFSClient.java:597)
>         at java.lang.Thread.run(Thread.java:619)
> 
> "org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=10
> tid=0x000000005c7c5c00 nid=0x224d waiting on condition
> [0x0000000040d14000..0x0000000040d14b90]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.ipc.Client
> $ConnectionCuller.run(Client.java:423)
> 
> 
> "main" prio=10 tid=0x000000005c417000 nid=0x223b waiting for monitor
> entry [0x0000000040207000..0x0000000040209ed0]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.hadoop.dfs.DFSClient
> $DFSOutputStream.writeChunk(DFSClient.java:2117)
>         - waiting to lock <0x00002aaab43ad910> (a java.util.LinkedList)
>         at
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141)
>         at
> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
>         at
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>         - locked <0x00002aaab43addd8> (a org.apache.hadoop.dfs.DFSClient
> $DFSOutputStream)
>         at org.apache.hadoop.fs.FSDataOutputStream
> $PositionCache.write(FSDataOutputStream.java:41)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         - locked <0x00002aaab43aef18> (a
> org.apache.hadoop.fs.FSDataOutputStream)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:151)
>         at
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1028)
>         at
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1016)
>         at
> org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:1006)
>         at
> org.apache.hadoop.fs.FileSystem.completeLocalOutput(FileSystem.java:1077)
> 	...
> 
> Any Help with that? Ask for more information if needed. 
> 
> Thanks, and congratulations for your revolutionary project. 
> 
> Iván de Prado Alonso
> http://ivandeprado.blogspot.com/
> 
> 
> 


Mime
View raw message