hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6823) FileOutputFormat to support configurable PathOutputCommitter factory
Date Mon, 06 Nov 2017 17:26:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Steve Loughran updated MAPREDUCE-6823:
    Attachment: MAPREDUCE-6823-004.patch

Patch 004; in sync with the S3A patch being revieed at HADOOP-14971/HADOOP-15003

dealing with the various checkstyle issues. This code has been tested all the way through
spark now, though it's not completely hidden, particularly in the case of Parquet, where Spark
has some hard expectations about the type of committer there. All  is working (testable, because
the _SUCCESS file generated by the new committers contains some summary data)

> FileOutputFormat to support configurable PathOutputCommitter factory
> --------------------------------------------------------------------
>                 Key: MAPREDUCE-6823
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6823
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0-alpha2
>         Environment: Targeting S3 as the output of work
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, MAPREDUCE-6823-002.patch, MAPREDUCE-6823-002.patch,
> In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which can talk
direct to the S3A Filesystem for more efficient operations, better failure modes, and, most
critically, as part of HADOOP-13345, atomic commit of output. The normal committer relies
on directory rename() being atomic for this; for S3 we don't have that luxury.
> To support a custom committer, we need to be able to tell FileOutputFormat (and implicitly,
all subclasses which don't have their own custom committer), to use our new {{S3AOutputCommitter}}.
> I propose: 
> # {{FileOutputFormat}} takes a factory to create committers.
> # The factory to take a URI and {{TaskAttemptContext}} and return a committer
> # the default implementation always returns a {{FileOutputCommitter}}
> # A configuration option allows a new factory to be named
> # An {{S3AOutputCommitterFactory}} to return a  {{FileOutputCommitter}} or new {{S3AOutputCommitter}}
depending upon the URI of the destination.
> Note that MRv1 already supports configurable committers; this is only the V2 API

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message