hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1122) streaming with custom input format does not support the new API
Date Tue, 06 Jul 2010 09:40:51 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Amareshwari Sriramadasu updated MAPREDUCE-1122:

    Attachment: patch-1122.txt

Attaching a patch which does the following:
* Deprectaes all the library classes in streaming such as AutoInputFormat, StreamInputFormat,
StreamXmlRecordReader etc. and adds new classes which use new api. 
* Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes.
* Adds StreamJobConfig holding all the configuration properties used in streaming.
* Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which extend new api
Mapper and Reducer classes.
  ** Adds a class StreamingProcess which starts streaming process, MR output/error threads
and waits for the threads and etc. This functionality is in PipeMapred.java for the old api
mapper/reducer; PipeMapper and PipeReducer extend PipeMapred and implement old Mapper/Reducer
interfaces. We cannot make StreamingMapper/StreamingReducer extend StreamingProcess because
in new api mapper and reducer are not interfaces. So moved this into a separate class so that
StreamingMapper/StreamingReducer composes it.
  ** InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance as a parameter
for the constructor. But it does not make sense now because the process handling is served
by separate class, StreamingProcess, for new api mapper/reducer. So, did a following Incompatible
change (looks clean now):
  *** Changes OutputReader constructor to take DataInput as parameter, instead of PipeMapRed
  *** Changes InputWriter constructor to take DataOutput as parameter, instead of PipeMapRed
* Moves some utility methods in PipeMapRed to StreamUtil.
* Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates static public JobConf
createJob(String[] argv); and adds static public Job createStreamingJob(String[] argv)
* Refactors setJobConf() into multiple setters to set appropriate mapper/reducer in use.
* Adds unit tests for all the usecases described [above|https://issues.apache.org/jira/browse/MAPREDUCE-1122?focusedCommentId=12878515&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878515]

> streaming with custom input format does not support the new API
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-1122
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.1
>         Environment: any OS
>            Reporter: Keith Jackson
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1122.txt
> When trying to implement a custom input format for use with streaming, I have found that
streaming does not support the new API, org.apache.hadoop.mapreduce.InputFormat, but requires
the old API, org.apache.hadoop.mapred.InputFormat.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message