hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
Date Mon, 24 Aug 2009 03:42:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746720#action_12746720
] 

Amareshwari Sriramadasu commented on MAPREDUCE-370:
---------------------------------------------------

bq. I think that there should be the ability to have complete control over the output filename,
much as MultipleOutputFormat does. To achieve this we could change the baseOutputPath parameter
in the write methods to be a full output path. The user application would be reponsible for
making sure there are no name clashes - this is like the functionality available in MultipleOutputFormat
today. The overloaded version is available if the user doesn't care so much about the output
filenames, which will then have a {m,r}-nnnnn suffix. Does this make sense?

Tom, I did not do this, because MultipleOutputs has a feature for maintaining counters, which
counts the number of records written to each output name. If we take full output name from
user, aggregating these counters at job level is not straight forward.  Also, if user doesn't
give unique name for the output file, there are chances that output will be garbled. So, I
thought taking baseOutputName (which is the counter name also) from user and constructing
full output filename by the framework would be the right solution. Don't you think this is
right?

bq.I think that there should be the ability to have complete control over the output filename,
much as MultipleOutputFormat does. 
With current patch, user has complete control over the path. Just that whatever path he chooses,
the file name is <baseOutputPath>-m/r-<part-number>

> Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-370
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-370-1.txt, patch-370.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message