hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13786) add output committer which uses s3guard for consistent O(1) commits to S3
Date Tue, 29 Nov 2016 19:51:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706351#comment-15706351
] 

Steve Loughran edited comment on HADOOP-13786 at 11/29/16 7:51 PM:
-------------------------------------------------------------------

For the curious, my WiP, in the s3guard branch, is here: https://github.com/steveloughran/hadoop/tree/s3guard/HADOOP-13786-committer

I've been tweaking the mapreduce code to add ability to switch to a subclass of FileOutputCommitter
in FileOutputFormat (And transitively, all children), by way of a factory. The S3a factory
dynamically chooses the committer based on the destination FS. There's currently no difference
in the codebase apart from logging of operation durations. This means that we can switch the
committer behind all output formats to using the s3a one (and also allowing anyone else to
do the same with their own FOF committer subclass). 

I can see that ParquetOutputFormat has its own committer, {{ParquetOutputCommitter}}, as used
by {{ParquetOutputFormat}} and so, I believe spark's {{dataframe.parquet.write()}} pipeline;
my IDE isn't highlighting much else.

Now, could I get away with just modifying the base FOF committer?


for exception reporting it could use {{FileSystem.rename(final Path src, final Path dst, final
Rename... options)}}; S3A to impl that with exceptions. That's the "transitory" method for
use between FS and FileContext, which does raise exceptions; the default is non-atomic and
eventually gets to {{rename(src, dest)}}, except for HDFS which does a full implementation.
But: what if the semantics of that rename() (not yet in the FS spec, AFAIK) are different
from what callers expect in some of the corner cases? And so commits everywhere break? That
would not be good. And it would remove the ability to use any private s3a/s3guard calls if
we wanted some (e.g. get a lock on a directory), unless they went in to the standard FS APIs.

Making the output factory pluggable would avoid such problems, though logging duration might
be nice all round. That said, given the time to rename, its somewhat unimportant when using
real filesystems.





was (Author: stevel@apache.org):
For the curious, my WiP, in the s3guard branch, is here: https://github.com/steveloughran/hadoop/tree/s3guard/HADOOP-13786-committer

I've been tweaking the mapreduce code to add ability to switch to a subclass of FileOutputCommitter
in FileOutputFormat (And transitively, all children), by way of a factory. The S3a factory
dynamically chooses the committer based on the destination FS. There's currently no difference
in the codebase apart from logging of operation durations. This means that we can switch the
committer behind all output formats to using the s3a one (and also allowing anyone else to
do the same with their own FOF committer subclass). 

I can see that ParquetOutputFormat has its own committer, {{ParquetOutputCommitter}}, as used
by {{ParquetOutputFormat}} and so, I believe spark's {{dataframe.parquet.write()}} pipeline;
my IDE isn't highlighting much else.

Now, could I get away with just modifying the base FOF committer?


for exception reporting it could use {{FileSystem.rename(final Path src, final Path dst, final
Rename... options)}}; S3A to impl that with exceptions. That's the "transitory" method for
use between FS and FileContext, which does raise exceptions; the default is non-atomic and
eventually gets to {{rename(src, dest)}}, except for HDFS which does a full implementation.
But: what if the semantics of that rename() (not yet in the FS spec, AFAIK) are different
from what callers expect in some of the corner cases? And so commits everywhere break? That
would not be good. And it would remove the ability to use any private s3a/s3guard calls if
we wanted some (e.g. get a lock on a directory), unless they went in to the standard FS APIs.

Making the output factory pluggable would avoid such problems, though logging duration might
be nice all round. That said, given the time to rename, its somewhat unimportant when using
real filesystems.




> add output committer which uses s3guard for consistent O(1) commits to S3
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message