hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5076) CombineFileInputFormat with maxSplitSize can omit data
Date Sat, 16 Mar 2013 00:46:13 GMT
Sandy Ryza created MAPREDUCE-5076:

             Summary: CombineFileInputFormat with maxSplitSize can omit data
                 Key: MAPREDUCE-5076
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5076
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Sandy Ryza
            Assignee: Sandy Ryza

I ran a local job with CombineFileInputFormat using an 80 MB file and a max split size of
32 MB (the default local FS block size).  The job ran with two splits of 32 MB, and the last
16 MB were just omitted.

This appears to be caused by a subtle bug in getMoreSplits, in which the code that generates
the splits from the blocks expects the 16 MB block to be at the end of the block list. But
the code that generates the blocks does not respect this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message