hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Derek Farren (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5069) add concrete common implementations of CombineFileInputFormat
Date Thu, 08 Aug 2013 19:35:49 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733881#comment-13733881
] 

Derek Farren commented on MAPREDUCE-5069:
-----------------------------------------

It doesn't work on sequence files because the CombineFileRecordReaderWrapper constructor deals
only with first argument type FileInputFormat. It needs another constructor that can deal
with SequenceFileInputFormat.

This is what we have:
CombineFileRecordReaderWrapper(FileInputFormat<K,V> inputFormat, CombineFileSplit split,
TaskAttemptContext context, Integer idx)

This is what we need to add:
CombineFileRecordReaderWrapper(SequenceFileInputFormat<K, V> sequenceFileInputFormat,
CombineFileSplit split, TaskAttemptContext context, Integer idx)
                
> add concrete common implementations of CombineFileInputFormat
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-5069
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5069
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, mrv2
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sangjin Lee
>            Priority: Minor
>             Fix For: 3.0.0, 2.1.0-beta
>
>         Attachments: MAPREDUCE-5069-1.patch, MAPREDUCE-5069-2.patch, MAPREDUCE-5069-3.patch,
MAPREDUCE-5069-4.patch, MAPREDUCE-5069-5.patch, MAPREDUCE-5069-6.patch, MAPREDUCE-5069.patch
>
>
> CombineFileInputFormat is abstract, and its specific equivalents to TextInputFormat,
SequenceFileInputFormat, etc. are currently not in the hadoop code base.
> These sound like very common need wherever CombineFileInputFormat is used, and different
folks would write the same code over and over to achieve the same goal. It sounds very natural
for hadoop to provide at least the text and sequence file implementations of the CombineFileInputFormat
class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message