hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
Date Wed, 05 Aug 2009 14:01:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739504#action_12739504

Tom White commented on MAPREDUCE-370:

The only feature that MultipleOutputs needs to make it at least as powerful as MultipleOutputFormat
is the ability to control the output file name. At present the MultipleOutputs file name is


whereas in MultipleOutputFormat you have complete control over the naming, including the ability
to create subdirectories by having a path separator ({{/}}) in the name.

To achieve this, I think we could port MultipleOutputs, and change the semantics of getCollector()
in the multi name case, so that the multi name is the full name of the name of the output
file. This method is typically invoked in the reduce() method, where the key and value are
available, and can be used to form the name. Applications that want to add a unique suffix
can call FileOutputFormat#getUniqueFile() themselves.

The single name case would work as before and create a single output file for a named output.

> Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
> -------------------------------------------------------------------
>                 Key: MAPREDUCE-370
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message