hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-13191) Internal buffer-sizing details are inadvertently baked into FileChecksum and BlockGroupChecksum
Date Mon, 05 Mar 2018 22:12:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386395#comment-16386395
] 

Ajay Kumar edited comment on HDFS-13191 at 3/5/18 10:11 PM:
------------------------------------------------------------

 Ya, it seems there is no good solution to this.
{quote} If getData() always allocates a new array, it likely eliminates any (perceived) advantage
of using DataOutputBuffer in the first place{quote}
Modifying getData() to return only the trimmed copy of the array will result in overhead of
new byte array allocation. Although we can minimize this by wrapping it with an if condition.
(Create new copy only when count< length). 
{quote}Even though its interface doesn't promise anything about the size of the returned byte[],
the fact that the checksums are sensitive to it means at least some places are apparently
(unintentionally) relying on the exact buffer-sizing behavior already, so changing it inside
DataOutputBuffer directly has a higher risk of breaking unexpected system components.{quote}
Ya, but leaving it as it is may also involve risk of breaking some future functionality.
I don't feel strongly about any specific approach but leaving bug as it is may result in more
innocuous  critical bugs somewhere else. At the minimum we should add more documentation to
it to warn users about this bug.


was (Author: ajayydv):
 Ya, it seems there is no good solution to this.
{quote} If getData() always allocates a new array, it likely eliminates any (perceived) advantage
of using DataOutputBuffer in the first place{quote}
Modifying getData() to return only the trimmed copy of the array will result in overhead of
new byte array allocation. Although we can minimize this by wrapping it with an if condition.
(Create new copy only when count< length). 
{quote}Even though its interface doesn't promise anything about the size of the returned byte[],
the fact that the checksums are sensitive to it means at least some places are apparently
(unintentionally) relying on the exact buffer-sizing behavior already, so changing it inside
DataOutputBuffer directly has a higher risk of breaking unexpected system components.{quote}
Ya, but leaving it as it is may also involve risk of breaking critical functionality.

I don't feel strongly about any specific approach but leaving bug as it is may result in more
innocuous  critical bugs somewhere else. At the minimum we should add more documentation to
it to warn users about this bug.

> Internal buffer-sizing details are inadvertently baked into FileChecksum and BlockGroupChecksum
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13191
>                 URL: https://issues.apache.org/jira/browse/HDFS-13191
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, hdfs-client
>            Reporter: Dennis Huo
>            Priority: Minor
>         Attachments: HDFS-13191.001.patch
>
>
> {color:red}colored text{color}The org.apache.hadoop.io.DataOutputBuffer is used as an
"optimization" in many places to allow a reusable form of ByteArrayOutputStream, but requires
the caller to be careful to use getLength() instead of getData().length to determine the number
of actually valid bytes to consume.
> At least three places in the path of constructing FileChecksums have incorrect usage
of DataOutputBuffer:
> [FileChecksumHelper digesting block MD5s|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java#L239]
> [BlockChecksumHelper digesting striped block MD5s to construct block-group checksum|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java#L412]
> [MD5MD5CRC32FileChecksum.getBytes()|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java#L76]
> The net effect is that FileChecksum consumes exact BlockChecksums if there are 1 or 2
blocks (at 16 and 32 bytes respectively), but at 3 blocks will round up to 64 bytes, effectively
returning the same FileChecksum as if there were 4 blocks and the 4th block happened to have
an MD5 exactly equal to 0x00...00. Similarly, BlockGroupChecksum will behave as if there is
a power-of-2 number of bytes from BlockChecksums in the BlockGroup.
> This appears to have been a latent bug for at least 9 years for FileChecksum (and since
inception for the implementation of striped files), and works fine as long as HDFS implementations
strictly stick to the same internal buffering semantics.
> However, this also makes the implementation extremely brittle unless carefully documented.
For example, if code is ever refactored to pass around a MessageDigest that consumes block
MD5s as they come rather than writing into a DataOutputBuffer before digesting the entire
buffer, then the resulting checksum calculations will change unexpectedly.
> At the same time, "fixing" the bug would also be backwards-incompatible, so the bug might
need to stick around. At least for the FileChecksum-level calculation, it seems the bug has
been latent for a very long time. Since striped files are fairly new, the BlockChecksumHelper
could probably be fixed sooner rather than later to avoid perpetuating a bug. The getBytes()
method for FileChecksum is more innocuous, so could likely be fixed or left as-is without
too much impact either way.
> The bug can be highlighted by changing the internal buffer-growing semantics of the DataOutputBuffer,
or simply returning a randomly-sized byte buffer in getData() while only ensuring the first
getLength() bytes are actually present, for example:
>  
> {code:java}
> diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> index 4c2fa67f8f2..f2df94e898f 100644
> --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java
> @@ -103,7 +103,17 @@ private DataOutputBuffer(Buffer buffer) {
> /** Returns the current contents of the buffer.
> * Data is only valid to {@link #getLength()}.
> */
> - public byte[] getData() { return buffer.getData(); }
> + public byte[] getData() {
> + java.util.Random rand = new java.util.Random();
> + byte[] bufferData = buffer.getData();
> + byte[] ret = new byte[rand.nextInt(bufferData.length) + bufferData.length];
> + System.arraycopy(bufferData, 0, ret, 0, getLength());
> + return ret;
> + }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message