hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6635) Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException
Date Wed, 17 Feb 2016 00:45:18 GMT
Sergey Shelukhin created MAPREDUCE-6635:
-------------------------------------------

             Summary: Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException
                 Key: MAPREDUCE-6635
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6635
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Sergey Shelukhin


LineRecordReader creates the unsplittable reader like so:
{noformat}
      in = new UncompressedSplitLineReader(
          fileIn, job, recordDelimiter, split.getLength());
{noformat}
Split length goes to
{noformat}
  private long splitLength;
{noformat}
At some point when reading the first line, fillBuffer does this:
{noformat}
  @Override
  protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter)
      throws IOException {
    int maxBytesToRead = buffer.length;
    if (totalBytesRead < splitLength) {
      maxBytesToRead = Math.min(maxBytesToRead,
                                (int)(splitLength - totalBytesRead));
{noformat}
which will be a negative number for large splits, and the subsequent dfs read will fail with
a boundary check.
This has been reported here: https://issues.streamsets.com/browse/SDC-2229, also happens in
Hive if very large text files are forced to be read in a single split (e.g. via header-skipping
feature, or via set mapred.min.split.size=9999999999999999;)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message