hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1170) MultipleInputs doesn't work with new API in 0.20 branch
Date Fri, 30 Oct 2009 21:45:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772108#action_12772108

Jay Booth commented on MAPREDUCE-1170:

Turns out the test only passes because it doesn't try to actually execute the job.  It just
uses MultipleInputs to add the inputs, then checks that they were added to the appropriate
structures in memory.

When you run an actual job using TextInputFormat, we get:

java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot
be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
	at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:55)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:582)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)

This probably affects 0.21 as well, based on my brief reading of the code..  any suggestions?
 Seems kinda hard to work around without changing the signature of InputSplit, which would
be pretty disruptive.

One (very hacky) method that could be used would be to have LineRecordReader do something
along the lines of 
if (split instanceof TaggedInputSplit) split = ((TaggedInputSplit)split).getInnerSplit()

Any other ideas?

> MultipleInputs doesn't work with new API in 0.20 branch
> -------------------------------------------------------
>                 Key: MAPREDUCE-1170
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1170
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.20.1
>            Reporter: Jay Booth
>             Fix For: 0.20.2
>         Attachments: multipleInputs.patch
> This patch adds support for MultipleInputs (and KeyValueTextInputFormat) in o.a.h.mapreduce.lib.input,
working with the new API.  Included passing unit test.  Include for 0.20.2?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message