hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6155) MapFiles are not always correctly detected by SequenceFileInputFormat
Date Wed, 06 May 2015 03:27:47 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Allen Wittenauer updated MAPREDUCE-6155:
    Labels: BB2015-05-TBR  (was: )

> MapFiles are not always correctly detected by SequenceFileInputFormat
> ---------------------------------------------------------------------
>                 Key: MAPREDUCE-6155
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6155
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jens Rabe
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-6155.001.patch, MAPREDUCE-6155.002.patch
>   Original Estimate: 2h
>  Remaining Estimate: 2h
> MapFiles are not correctly detected by SequenceFileInputFormat.
> This is because the listStatus method only detects a MapFile correctly if the path it
checks is a directory - it then replaces it by the path of the data file.
> This is likely to fail if the data file does not exist, i.e., if the input path is a
directory, but does not belong to a MapFile, or if recursion is turned on and the input format
comes across a file (not a directory) which is indeed part of a MapFile.
> The listStatus method should be changed to detect these cases correctly:
> * if the current candidate is a file and its name is "index" or "data", check if its
corresponding other file exists, and if the key types of both files match and if the value
type of the index file is LongWritable
> * If the current candidate is a directory, it is only a MapFile if (and only if) an index
and a data file exist, they are both SequenceFiles and their key types match (and the index
value type is LongWritable)

This message was sent by Atlassian JIRA

View raw message