hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiral Patel (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1145) Multiple Outputs doesn't work with new API in 0.20 branch
Date Fri, 19 Feb 2010 18:08:27 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hiral Patel updated MAPREDUCE-1145:
-----------------------------------

    Attachment: updated-MAPREDUCE-1145-branch-20.patch

There is a bug where multiple outputs with different output key and value classes are not
working.  All outputs have the same output key and value class.  Added patch to MultipleOutputs.java
to fix this.

Here is the diff from Jay's patch:

313,314d312
< +import org.apache.hadoop.io.LongWritable;
< +import org.apache.hadoop.io.Text;
734c732
< +        outputFormat.getRecordWriter(new MOTaskAttemptContextWrapper(namedOutput,ctx));
---
> +        outputFormat.getRecordWriter(ctx);
876,906d873
< +
< +  private class MOTaskAttemptContextWrapper extends TaskAttemptContext {
< +
< +    private final Class<?+outputKeyClass;
< +    private final Class<?+outputValueClass;
< +
< +    public MOTaskAttemptContextWrapper(final String namedOutput,
< +                                       TaskAttemptContext ctx) {
< +      super(ctx.getConfiguration(), ctx.getTaskAttemptID());
< +      outputKeyClass=conf.getClass(MO_PREFIX + namedOutput +   KEY, LongWritable.class);
< +      outputValueClass=conf.getClass(MO_PREFIX + namedOutput +   VALUE, Text.class);
< +    }
< +
< +    /**
< +     * Get the key class for the job output data.
< +     * @return the key class for the job output data.
< +     */
< +    @Override
< +    public Class<?+getOutputKeyClass() {
< +      return outputKeyClass;  
< +    }
< +
< +    /**
< +     * Get the value class for job outputs.
< +     * @return the value class for job outputs.
< +     */
< +    @Override
< +    public Class<?+getOutputValueClass() {
< +      return outputValueClass;  
< +    }
< +  }

> Multiple Outputs doesn't work with new API in 0.20 branch
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-1145
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1145
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Jay Booth
>             Fix For: 0.20.2
>
>         Attachments: MAPREDUCE-1145-branch-20.patch, updated-MAPREDUCE-1145-branch-20.patch
>
>
> I know this is working in the 0.21 branch but it's dependent on a ton of other refactorings
and near-impossible to backport.  I hacked together a quick forwards-port in o.a.h.mapreduce.lib.output.MultipleOutputs.
 Unit test attached, requires a one-liner change to FileOutputFormat.
> Maybe 0.20.2?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message