hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris K Wensel (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath
Date Fri, 24 Oct 2008 05:17:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642383#action_12642383

Chris K Wensel commented on HADOOP-4510:

We prefer it public because we write through the FileOutputFormat class via a RecordWriter,
which internally (magically) inserts the temp path and task id path at the end of the intended

This is done so that speculative execution will succeed. And we would like to benefit from
this behavior, so aren't really asking that it change.

The side effect is that we have no way of finding the actual location of the data written
and then moving it to where it was intended to be written. 

Since we don't have multiple (named) output collectors, we must emulate the behavior through
our own api.

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Priority: Blocker
>         Attachments: hadoop-4510.patch
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite
handy. This is especially true if the user is attempting to serialize out data in a similar
fashion as the output collector.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message