hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-5459) CRC errors not detected reading intermediate output into memory with problematic length
Date Wed, 11 Mar 2009 05:42:50 GMT
CRC errors not detected reading intermediate output into memory with problematic length
---------------------------------------------------------------------------------------

                 Key: HADOOP-5459
                 URL: https://issues.apache.org/jira/browse/HADOOP-5459
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Chris Douglas
            Priority: Blocker


It's possible that the expected, uncompressed length of the segment is less than the available/decompressed
data. This can happen in some worst-cases for compression, but it is exceedingly rare. It
is also possible (though also fantastically unlikely) for the data to deflate to a size greater
than that reported by the map. CRC errors will remain undetected because IFileInputStream
does not validate the checksum until the end of the stream, and close() does not advance the
stream to the end of the segment. The (abbreviated) read loop fetching data in shuffleInMemory:

{code}
int n = input.read(shuffleData, 0, shuffleData.length);
while (n > 0) { 
  bytesRead += n;
  n = input.read(shuffleData, bytesRead, 
                 (shuffleData.length-bytesRead));
} 
{code}

Will read only up to the expected length. Without reading the whole segment, the checksum
is not validated. Even if IFileInputStream instances are closed, they should always validate
checksums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message