hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakuma...@huawei.com>
Subject [Important] Checksum Error while appending to the file.
Date Wed, 29 Feb 2012 10:29:12 GMT
Hi All,

In one of our hadoop cluster we faced CheckSum file Corruption, due to which appending to
the file failed.

If any one of you faced this problem earlier, please share your experiences.

We are using hadoop 0.20.1 with append feature.

1. Created the file, written 305 bytes, closed the Stream.

2. Called append to same file and written 307 bytes and closed the stream.

3. Repeated the Step 2 with different bytes (311, 313, 307, 305, 313, 311, 307, 311, 313,
307, 307, 307, 305, 307, 305, 290, 288, 305, 307, 307, 307, 290);

4. Now again Step 2 is repeated with 294 bytes. Now pipeline was {xxx.xxx.xxx.106:50010, xxx.xxx.xxx.xxx:10010}
Now file length becomes 7629. And stream is closed.

Here checksum will be verified by the Last DataNode in the pipeline for every packet received.
If verification fails then Exception will be thrown.

Since There is no exception in any of the DataNode Logs, Checksum verification should be successful.
And meta file size should be 67 bytes.

Meta File should contain 15 checksum bytes and 7 header bytes. 7629/512=14 checksums for full
chunks and 1 checksum for partial chunk.

5. Now Again append to the same file is called, Now append fails because of the recovery failure
at DataNodes due to below Exception.

java.io.IOException: Block blk_1329468764084_188363 is of size 7629 but has 17 checksums and
each checksum size is 4 bytes.
 at org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1922)
 at org.apache.hadoop.hdfs.server.datanode.FSDataset.startBlockRecovery(FSDataset.java:2142)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.startBlockRecovery(DataNode.java:2078)
 at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1139)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1135)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1133)

Here metafile length is 17*4+7=75 bytes. But it should be 67 bytes according to step 5.
Data block size in Step 4 and Step 5 are matching, but metafile sizes are not matching.

Thanks and Regards,
Vinayakumar B

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message