hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen O'Donnell (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14706) Checksums are not checked if block meta file is less than 7 bytes
Date Wed, 28 Aug 2019 16:10:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917880#comment-16917880
] 

Stephen O'Donnell commented on HDFS-14706:
------------------------------------------

It turned out newDataChecksum can throw "InvalidChecksumSizeException" as it makes a call
to mapByteToChecksumType() which can throw it. However, there were still scenarios where a
null dataChecksum could be returned. Therefore I have changed:

newDataChecksum(byte[] bytes, int offset)

To behave in the same way as:

newDataChecksum( DataInputStream in )

It will now throw InvalidChecksumSizeException rather than returning null if it cannot create
the DataChecksum.


It turns out the loop does have a bug:

{code}
while (buf.hasRemaining()) {
  if (fc.read(buf, 0) <= 0) {
    throw new CorruptMetaHeaderException("EOF while reading header from "+
         "the metadata file. The meta file may be truncated or corrupt");
   }
}
{code}

If the file has 4 bytes, for example. It will read the first 4 bytes from position zero on
the first pass, and then read the first 3 bytes *again* from position zero to fill the buffer.

The read call needs to be "fc.read(buf, buf.position())" to start reading the channel at the
position it left off on the last read. If it is like that, it will throw a CorruptMetaHeaderException
if it fails to read the 7 bytes it expects.

I have fixed this issue and added another unit test to prove the preadHeader is now working
correctly.

> Checksums are not checked if block meta file is less than 7 bytes
> -----------------------------------------------------------------
>
>                 Key: HDFS-14706
>                 URL: https://issues.apache.org/jira/browse/HDFS-14706
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14706.001.patch, HDFS-14706.002.patch, HDFS-14706.003.patch,
HDFS-14706.004.patch, HDFS-14706.005.patch
>
>
> If a block and its meta file are corrupted in a certain way, the corruption can go unnoticed
by a client, causing it to return invalid data.
> The meta file is expected to always have a header of 7 bytes and then a series of checksums
depending on the length of the block.
> If the metafile gets corrupted in such a way, that it is between zero and less than 7
bytes in length, then the header is incomplete. In BlockSender.java the logic checks if the
metafile length is at least the size of the header and if it is not, it does not error, but
instead returns a NULL checksum type to the client.
> https://github.com/apache/hadoop/blob/b77761b0e37703beb2c033029e4c0d5ad1dce794/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L327-L357
> If the client receives a NULL checksum client, it will not validate checksums at all,
and even corrupted data will be returned to the reader. This means this corrupt will go unnoticed
and HDFS will never repair it. Even the Volume Scanner will not notice the corruption as the
checksums are silently ignored.
> Additionally, if the meta file does have enough bytes so it attempts to load the header,
and the header is corrupted such that it is not valid, it can cause the datanode Volume Scanner
to exit, which an exception like the following:
> {code}
> 2019-08-06 18:16:39,151 ERROR datanode.VolumeScanner: VolumeScanner(/tmp/hadoop-sodonnell/dfs/data,
DS-7f103313-61ba-4d37-b63d-e8cf7d2ed5f7) exiting because of exception 
> java.lang.IllegalArgumentException: id=51 out of range [0, 5)
> 	at org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
> 	at org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:173)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:139)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:153)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.loadLastPartialChunkChecksum(FsVolumeImpl.java:1140)
> 	at org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.loadLastPartialChunkChecksum(FinalizedReplica.java:157)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockSender.getPartialChunkChecksumForFinalized(BlockSender.java:451)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:266)
> 	at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:446)
> 	at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:558)
> 	at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2019-08-06 18:16:39,152 INFO datanode.VolumeScanner: VolumeScanner(/tmp/hadoop-sodonnell/dfs/data,
DS-7f103313-61ba-4d37-b63d-e8cf7d2ed5f7) exiting.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message