hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1122) streaming with custom input format does not support the new API
Date Tue, 06 Jul 2010 09:40:51 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sriramadasu updated MAPREDUCE-1122:
-----------------------------------------------

    Attachment: patch-1122.txt

Attaching a patch which does the following:
* Deprectaes all the library classes in streaming such as AutoInputFormat, StreamInputFormat,
StreamXmlRecordReader etc. and adds new classes which use new api. 
* Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes.
* Adds StreamJobConfig holding all the configuration properties used in streaming.
* Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which extend new api
Mapper and Reducer classes.
  ** Adds a class StreamingProcess which starts streaming process, MR output/error threads
and waits for the threads and etc. This functionality is in PipeMapred.java for the old api
mapper/reducer; PipeMapper and PipeReducer extend PipeMapred and implement old Mapper/Reducer
interfaces. We cannot make StreamingMapper/StreamingReducer extend StreamingProcess because
in new api mapper and reducer are not interfaces. So moved this into a separate class so that
StreamingMapper/StreamingReducer composes it.
  ** InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance as a parameter
for the constructor. But it does not make sense now because the process handling is served
by separate class, StreamingProcess, for new api mapper/reducer. So, did a following Incompatible
change (looks clean now):
  *** Changes OutputReader constructor to take DataInput as parameter, instead of PipeMapRed
  *** Changes InputWriter constructor to take DataOutput as parameter, instead of PipeMapRed
* Moves some utility methods in PipeMapRed to StreamUtil.
* Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates static public JobConf
createJob(String[] argv); and adds static public Job createStreamingJob(String[] argv)
* Refactors setJobConf() into multiple setters to set appropriate mapper/reducer in use.
* Adds unit tests for all the usecases described [above|https://issues.apache.org/jira/browse/MAPREDUCE-1122?focusedCommentId=12878515&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878515]


> streaming with custom input format does not support the new API
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-1122
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.1
>         Environment: any OS
>            Reporter: Keith Jackson
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have found that
streaming does not support the new API, org.apache.hadoop.mapreduce.InputFormat, but requires
the old API, org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message