hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaideep (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API
Date Mon, 10 May 2010 09:04:50 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865708#action_12865708
] 

Jaideep commented on MAPREDUCE-1122:
------------------------------------

Some changes that are needed in order to support this.
* Everywhere in StreamJob, o.a.h.mapred.JobConf is used. To allow 
new input and output formats, new o.a.h.mapreduce.Job object should be 
used instead. Alternatively we can create and set configuration without 
relying on JobConf or Job methods, and only create a JobConf or Job 
object depending upon whether old or new API is being used.

* PipeMapper and PipeReducer are also based on the old api. We will have 
to create new Mappers and Reducers based on the new API in order to 
support newer input and output formats. PipeMapRed also uses JobConf at 
a number of places. Almost all of these calls could be replaced by calls 
to Configuration object.

* StreamInputFormat extends o.a.h.mapred.KeyValueTextInputFormat. It 
should extend o.a.h.mapreduce.lib.input.KeyValueTextInputFormat

* StreamBaseRecordReader extends o.a.h.mapred.RecordReader. New class 
confirming to new API is needed.

* Some static methods in StreamUtil.java are using old api -
     getCurrentSplit - uses o.a.h.mapred.FileSplit and Jobconf. This 
method is not used anywhere else in the code.
     isLocalJobTracker - uses JobConf.
     getTaskInfo - uses JobConf to get type of a task and taskid. used 
in PipeMapRed.setStreamJobDetails to set the taskid.
     addJobConfToEnvironment - takes a JobConf as argument. Should also 
take a Job.
    There is a static TaskID class in StreamUtils.java as well. If its not needed can it be
removed?

> streaming with custom input format does not support the new API
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-1122
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.1
>         Environment: any OS
>            Reporter: Keith Jackson
>
> When trying to implement a custom input format for use with streaming, I have found that
streaming does not support the new API, org.apache.hadoop.mapreduce.InputFormat, but requires
the old API, org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message