hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5459) CRC errors not detected reading intermediate output into memory with problematic length
Date Wed, 11 Mar 2009 08:14:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated HADOOP-5459:
----------------------------------

    Attachment: 5459-1.patch

Added unit tests for IFile\*Streams.

> CRC errors not detected reading intermediate output into memory with problematic length
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5459
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Chris Douglas
>            Priority: Blocker
>         Attachments: 5459-0.patch, 5459-1.patch
>
>
> It's possible that the expected, uncompressed length of the segment is less than the
available/decompressed data. This can happen in some worst-cases for compression, but it is
exceedingly rare. It is also possible (though also fantastically unlikely) for the data to
deflate to a size greater than that reported by the map. CRC errors will remain undetected
because IFileInputStream does not validate the checksum until the end of the stream, and close()
does not advance the stream to the end of the segment. The (abbreviated) read loop fetching
data in shuffleInMemory:
> {code}
> int n = input.read(shuffleData, 0, shuffleData.length);
> while (n > 0) { 
>   bytesRead += n;
>   n = input.read(shuffleData, bytesRead, 
>                  (shuffleData.length-bytesRead));
> } 
> {code}
> Will read only up to the expected length. Without reading the whole segment, the checksum
is not validated. Even if IFileInputStream instances are closed, they should always validate
checksums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message