hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)
Date Tue, 22 Aug 2017 17:24:12 GMT
video being processed:  https://www.youtube.com/watch?v=oIe5Zl2YsLE&feature=youtu.be

its actually quite hard to show any benefits of s3guard on the command line, so I've ended
up showing some scala tests where I turn on the (bundled) inconsistent AWS client to show
how you then need to enable s3guard to make the stack traces go away

On 22 Aug 2017, at 11:17, Steve Loughran <stevel@hortonworks.com<mailto:stevel@hortonworks.com>>

+1 (binding)

I'm happy with it; it's a great piece of work by (in no particular order): Chris Nauroth,
Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits in the corners where I got
to break things while they were all asleep. Also deserving a mention: Thomas Demoor &
Ewan Higgs @ WDC for consultancy on the corners of S3, everyone who tested in (including our
QA team), Sanjay Radia, & others.

I've already done a couple of iterations of fixing checksyles & code reviews, so I think
it is ready. I also have a branch-2 patch based on earlier work by Mingliang, for people who
want that.

On 17 Aug 2017, at 23:07, Aaron Fabbri <fabbri@cloudera.com<mailto:fabbri@cloudera.com>>


I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
HADOOP-13345 feature branch into trunk.

This branch contains the new S3Guard feature which adds metadata
consistency features to the S3A client.  Formatted site documentation can
be found here:


The current patch against trunk is posted here:


The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:

- The feature is off by default, and care has been taken to insure it has
no impact when disabled.
- S3Guard can be enabled with the production database which is backed by
DynamoDB, or with a local, in-memory implementation that facilitates
integration testing without having to pay for a database.
- getFileStatus() as well as directory listing consistency has been
implemented and thoroughly tested, including delete tracking.
- Convenient Maven profiles for testing with and without S3Guard.
- New failure injection code and integration tests that exercise it.  We
use timers and a wrapper around the Amazon SDK client object to force
consistency delays to occur.  This allows us to assert that S3Guard works
as advertised.  This will be extended with more types of failure injection
to continue hardening the S3A client.

Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor

- core-default.xml defaults and documentation for s3guard parameters.
- A couple additional FS contract test cases around rename.
- More goodies in LambdaTestUtils
- A new CLI tool for inspecting and manipulating S3Guard features,
including the backing MetadataStore database.

This branch has seen extensive testing as well as use in production.  This
branch makes significant improvements to S3A's test toolkit as well.

Performance is typically on par with, and in some cases better than, the
existing S3A code without S3Guard enabled.

This feature was developed with contributions and feedback from many
people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
all of those who contributed feedback and work on the original design

This is the first major Apache Hadoop project I've worked on from start to
finish, and I've really enjoyed it.  Please shout if I've missed anything
important here or in the VOTE process.

Aaron Fabbri

To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org<mailto:common-dev-unsubscribe@hadoop.apache.org>
For additional commands, e-mail: common-dev-help@hadoop.apache.org<mailto:common-dev-help@hadoop.apache.org>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message