hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1122) streaming with custom input format does not support the new API
Date Fri, 29 Oct 2010 10:26:22 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sriramadasu updated MAPREDUCE-1122:
-----------------------------------------------

    Attachment: patch-1122-1.txt

Patch is updated to trunk with most of the review comments incorporated. Patch should be applied
on top of MAPREDUCE-1905 to pass all tests.

bq. It'd be really good if we can separate the new classes into new packages, library classes
into a lib package and implementation classes to an impl package?
Done

bq. There are two ways of handing the skipping of bad records in the new api ...........
Removed the dead code related to skipping in new api classes. Will add a subtask to MAPREDUCE-1932
to add support for streaming.

StreamingReducer.java
bq. Not logging exit code when exceptions happen in reduce. Used to be the case in old code.
Exit code is already logged in StreamingProcessManager. Even in old code, it was getting logged
twice.

bq. How about passing configuration configuration to InputWriter.initialize() and let TextInputWriter/TextOutputReader
maintain themselves the key/vaule separators and related information instead of polluting
StreamingMapper and StreamingReducer?
Did not do this. It makes the code more complicated because, mapper and reducers have different
configuration parameter names.

Autoinputformat2
bq. No configure method like in AutoInputFormat?
New api does not have configure for inputformat.

StreamJob.java
bq. Is the compatibility left in one release?
Yes. all the removed deprecated methods have been deprectaed since release 0.19


TrApp.java
bq. Some expect() and expectDefined() calls are dropped. I could understand why the ones related
to output format are dropped to accommodate testing both new and old apis. But removing of
the checks related to input file and file length didn't make sense to me.
New api does not have the configuration parameters for input file and length (HADOOP-5973).

bq. Should we make the initialize methods in InputWriter and OutputReader abstract now?
Did not do this. I don't think it is required.

Patch incorporates all other commands

> streaming with custom input format does not support the new API
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-1122
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.1
>         Environment: any OS
>            Reporter: Keith Jackson
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.22.0
>
>         Attachments: patch-1122-1.txt, patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have found that
streaming does not support the new API, org.apache.hadoop.mapreduce.InputFormat, but requires
the old API, org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message