hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345
Date Thu, 17 Aug 2017 05:21:51 GMT
Thanks for the detailed explanation Aaron. Given that this has gone through
Cloudera's QA cycle and is run in production, that adds a lot of confidence
in the feature. Looking forward to having this in 3.0.0-beta1!

Best,
Andrew

On Wed, Aug 16, 2017 at 2:17 PM, Aaron Fabbri <fabbri@cloudera.com> wrote:

>
>
> On Wed, Aug 16, 2017 at 1:39 PM, Andrew Wang <andrew.wang@cloudera.com>
> wrote:
>
>> Hi Steve,
>>
>> What's the target release vehicle, and the timeline for merging this? The
>> target date for beta1 is mid-September, so any large code movements make
>> me
>> nervous.
>>
>
> I think this is ready to get in before beta1.  Most of upstream s3a dev
> has been happening on this branch so it has a lot of improvements and
> testing.
>
>
>> Could you comment on testing and API stability of this branch? I'm
>> trusting
>> the judgement of the contributors involved, since there isn't much time to
>> fix things before beta1.
>>
>>
> We've done a ton of testing on this branch:
>
> - List consistency tests with failure injection. (HADOOP-13793) This
> integration test forces a delay in visibility of certain files by wrapping
> the AWS S3 client. It asserts listing is consistent. The test fails without
> S3Guard, and succeeds with it.
>
> - All existing S3 integration tests with and without S3Guard. The
> filesystem contract tests have been invaluable here. (HADOOP-13589 makes
> these very easy to run).
>
> - MetadataStore contract tests that ensure that the API semantics of the
> DynamoDB and in-memory reference implementations are correct.
>
> - MetadataStore scale tests that can be used to force DynamoDB service
> throttling and ensure we are robust to that.
>
> - Unit tests for different parts of the S3Guard logic.
>
> As you probably know, at Cloudera we are using this codebase in
> production, and have run all of our downstream tests including Hive, Spark,
> Impala on the new S3A client code, with and without S3Guard enabled.
>
> In terms of API compatibility, the new features sit behind the FileSystem
> / FileContext APIs, which have not changed.  Applications don't require any
> changes.  Internal APIs for S3Guard, such as MetadataStore (currently
> private / evolving), should be properly annotated already.  The S3Guard
> work has been active for quite a while now, so the APIs are fairly stable
> in practice.
>
> Probably my biggest goal in writing the S3AFileSystem integration code
> (HADOOP-13651) was to preserve existing logic and correctness when S3Guard
> is not enabled.  One design choice which has worked well was to define a
> "null" implementation of the MetadataStore (the API that filesystem clients
> use to log metadata changes):
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/
> NullMetadataStore.java
>
> This is used in S3A by default. This made it easier to reason about
> correctness and minimized the size of the diff to the FS client as well.
>
> Other questions welcomed!
>
> Cheers,
> Aaron
>
>
>
> Best,
>> Andrew
>>
>> On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran <stevel@hortonworks.com>
>> wrote:
>>
>> >
>> > FYI, We're getting ready for a patch to merge the current S3Guard
>> branch,
>> > HADOOP-13345, via a patch https://issues.apache.org/
>> > jira/browse/HADOOP-13998
>> >
>> > After that's done, we do plan to have a second iteration, work on a
>> > 0-rename committer (HADOOP-13786) with all the other tuning and
>> > improvements; We'd add a new uber-JIRA & move stuff over, maybe branch,
>> > and/or do things patch-by-patch .
>> >
>> > Anyway, now is a great time for people to download and play
>> >
>> > https://github.com/apache/hadoop/blob/HADOOP-13345/
>> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>> >
>> > testing this
>> >
>> > https://github.com/apache/hadoop/blob/HADOOP-13345/
>> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
>> >
>> > The Inconsistent AWS Client is also something everyone is free to use
>> for
>> > injecting inconsistencies (and soon faults) into their own apps by way
>> of
>> > 2-3 config options. Want to know how your code handles S3A being
>> observably
>> > inconsistent? We'll let you do that.
>> >
>> > -Steve
>> >
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message