zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ivmaykov <...@git.apache.org>
Subject [GitHub] zookeeper pull request #627: ZOOKEEPER-236: SSL Support for Atomic Broadcast...
Date Fri, 14 Sep 2018 20:09:10 GMT
GitHub user ivmaykov opened a pull request:


     ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol [part 2]

    # Overview
    These are fixes and improvements to #184 that we've coded up at Facebook while testing
that PR internally. This is meant to be in addition to #184, and should be stacked on top
of it (and in fact includes the commits from that pull request). Most notable changes:
    ## Fixed networking issues/bugs in UnifiedServerSocket
    - don't crash the accept() thread if the client closes the connection without sending
any data
    - don't corrupt the connection if the client sends smaller than 5 bytes for the initial
    - delay the detection of TLS vs. plaintext mode until a socket stream is read from or
written to, for example
    - prepending 5 bytes to `PrependableSocket` and then trying to read >5 bytes would
only return the first 5 bytes, even if more bytes were available. This is fixed.
    ## Added support for PEM formatted key stores and trust stores
    - key store and trust store files can now be in PEM format as well as JKS.
    - Added config properties to tell ZK what type of trust/key store to load:
    - `zookeeper.ssl.keyStore.type` and `zookeeper.ssl.trustStore.type` for ClientX509Util
    - `zookeeper.ssl.quorum.keyStore.type` and `zookeeper.ssl.quorum.trustStore.type` for
    - store type properties could have the values "JKS", "PEM", or not set
    - leaving the type properties unset will cause auto-detection of the store type based
on the file extension (".jks" or ".pem")
    ## Added support for reloading key/trust stores when the file on disk changes
    - new property `sslQuorumReloadCertFiles` which controls the behavior for reloading the
key and trust store files for `QuorumX509Util`. Reloading of key and trust store for `ClientX509Util`
is not in this PR but could be added easily
    - this allows a ZK server to keep running on a machine that uses short-lived certs that
refresh frequently without having to restart the ZK process.
    ## Added test utilities for easily creating X509 certs and using them in unit tests
    - added new class `X509TestContext` and its friend, `X509TestHelpers`
    - rewrote some existing unit tests to use these classes, and added new tests that use
    - some existing tests (i.e. `QuorumSSLTest`) should probably be ported to use this as
well, haven't got around to it yet
    ## Added more options for ssl settings to X509Util and encapsulate them better
    - previously, some SSL settings came from a ZKConfig and others came from global `System.getProperties()`.
This made it hard to isolate certain settings in tests.
    - now all SSL-related settings come from the ZKConfig object used to create the context
    - new settings added:
    - `zookeeper.ssl(.quorum).enabledProtocols` - list of enabled protocols. If not set, defaults
to a single-entry list with the value of `zookeeper.ssl(.quorum).protocol`.
    - `zookeeper.ssl(.quorum).clientAuth` - can be "NONE", "WANT", or "NEED". This controls
whether the server doesn't want / allows / requires the client to present an X509 certificate.
    - `zookeeper.ssl(.quorum).handshakeDetectionTimeoutMillis` - timeout for the first read
of 5 bytes to detect the transport mode (TLS or plaintext) of a client connection made to
a `UnifiedServerSocket`
    ## Fixed odd plaintext perf regression caused by new dependency on `org.apache.httpcomponents`
    - internal regression testing showed a >10% perf degradation in plaintext mode that
was tracked down to the addition of the `org.apache.httpcomponents` dependency.
    - Removing the dependency and implementing the hostname verification by mostly copy-pasting
the hostname verification code from `httpcomponents` fixed the regression.
    There may be some other changes that I'm forgetting, but those are the main ones.
    We believe that #184 should not be accepted without this PR coming close behind, mainly
because of the issues with UnifiedServerSocket. We were able to put clusters into a bad state
pretty easily simply by having a participant disconnect immediately after connecting to the
quorum port. This crashed the accept thread in the Leader and prevented any other participants
from connecting. This could happen due to network hick-ups, etc. Ideally, both PRs get +1s
around the same time and can land with a small delay one after another.
    We have started testing internal builds with #184 and the code in this PR. Quorums with
TLS disabled have shown no perf or stability regressions, so we believe it's safe to merge
this code to master as it's purely opt-in at this point, gated behind config options. We've
done correctness testing of quorums with TLS enabled, but have not done perf testing to measure
the impact of enabling quorum TLS yet. That should happen in the near future, and I will update
this PR with our perf test results once we have them.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ivmaykov/zookeeper fb-tls

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #627
commit 88b61716d26f6b3c11325367a075bb062dc4147c
Author: Andor Molnar <andor@...>
Date:   2018-04-03T19:24:49Z

    ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol

commit c452d1b036bb3c9cd53478254129474ad4e387ea
Author: Andor Molnar <andor@...>
Date:   2018-05-18T00:09:38Z

    ZOOKEEPER-236. Fixed unit test + added some extra debug logging

commit ed10e88de70aa97f12f586d0c52b63a7d9a4cbae
Author: Andor Molnar <andor@...>
Date:   2018-05-18T16:58:40Z

    ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value.
Null checks

commit 9ab476a7a4d345a94f174bcd831a350ec92c94f9
Author: Andor Molnar <andor@...>
Date:   2018-05-20T20:09:09Z

    ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2
and filter suitable cipher suites

commit d64eb26f75574f118d29cffec11d4116b469c36a
Author: Andor Molnar <andor@...>
Date:   2018-06-11T15:09:58Z

    ZOOKEEPER-236. Code review related changes:
    - server & client hostname verification can be set independently,
    - refactor defaultSSLContext to use AtomicReference,
    - some minor nitpicks

commit e8a1729794c8044f8fbde5c8f304cd1bd3648836
Author: Andor Molnar <andor@...>
Date:   2018-06-11T15:58:12Z

    ZOOKEEPER-236. Reverted to use single property for hostname verification

commit 777f31ac2dba38f2315dfdbfa47cdcb3832e8a6f
Author: Andor Molnar <andor@...>
Date:   2018-06-14T15:37:58Z

    ZOOKEEPER-236. Added Java8/Java9 default cipher suites

commit a9fa698131d36670b11973b8823efd11afa1acb5
Author: Andor Molnar <andor@...>
Date:   2018-06-15T11:57:46Z

    ZOOKEEPER-236. Code review fixes:
    - Added comment to clarify default enabled cipher suites,
    - Use java.specification.version to determine JDK version,
    - Use port unification only when SSL quorum is enabled,
    - Unit test fixes.

commit 1f8aab05bf45a5c6c40eb30c84c8a7f95a7fa27e
Author: Andor Molnar <andor@...>
Date:   2018-06-15T13:04:31Z

    ZOOKEEPER-236. Revert portUnification/sslQuorum logic

commit 209fbca78c083a1c06bee8d463ba77a86a3ee1ee
Author: Andor Molnar <andor@...>
Date:   2018-06-22T14:38:17Z

    ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings

commit d78e1146a8a35f6d2f9d75fd9b769ff8a423540c
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:23Z

    [rebase] [Quorum TLS] Fix bugs in PrependableSocket and UnifiedServerSocket
    Fixed some bugs in PrependableSocket and UnifiedServerSocket
    - using SequenceInputStream in PrependableSocket broke the read() behavior when trying
to read > 5 bytes
    - changed it to use PushbackInputStream which behaves as expected
    - changed UnifiedSocketTest to read more than 5 bytes from the unified socket to make
sure the fix works
    - call setNeedClientAuth(true) in UnifiedServerSocket.accept() when connection is TLS.
This is to have parity with X509Util.configureSSLServerSocket() which sets the same behavior
for strict server-side SSL sockets.
    - Use the Java8 API for creating a secure socket when some bytes have already been read.
I think this allows us to avoid the overhead of PushbackInputStream in the SSL case.
    - added "localhost" to the connect address of client socket in UnifiedServerSocket test.
The test didn't work for me otherwise.
    Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java`
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9234843
    Tasks: T31626216
    Signature: 9234843:1533776414:92d829ba49f997931345e7d90b4a2183b666f294

commit ef2db825516bf0a20d2a62a6f1c3449f543a15ca
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:24Z

    [rebase] [Quorum TLS] get SSL protocol from ZKConfig instead of System properties
    Summary: See title. This makes this property consistent with all the others that we already
get through the ZKConfig.
    Test Plan: `ant -Dtestcase=X509UtilTest test-core-java`
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com, ssl-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9236333
    Tasks: T31626216
    Signature: 9236333:1533836898:5d1c734d96b807a5f6a39f863661669abefe11aa

commit 8682c3b48880831f70e918ddf53ea4275fa08cb2
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:25Z

    [rebase] [Quorum TLS] added X509TestContext and supporting code
    Added a few new classes that will make testing security related code much easier.
    - X509TestContext: easy way to manage certs and private keys in unit tests, to be used
with SSL sockets
    - X509TestHelpers: static functions for creating certs and key pairs, and saving them
to JKS or PEM files on disk
    - X509KeyType: enum for asymmetric key types
    Test Plan: Bunch of tests in future diffs will rely on it
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9174703
    Tasks: T31626216
    Signature: 9174703:1533597616:330f1ea9db736c50825d6eb084e20af91b8fd782

commit fa08a806c59f7cc6e0f46686b6b388bfe07884e9
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:26Z

    [rebase] [Quorum TLS] update X509UtilTest to use X509TestContext and parameterize the
    Summary: See title. Now X509UtilTest has better coverage of different combinations of
key types (RSA, EC) and password protected / unprotected key store. Also added some new test
cases and removed redundant ones.
    Test Plan: ran X509UtilTest from IntelliJ
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9174711
    Tasks: T31626216
    Signature: 9174711:1533766713:efa901b625d079cf6f2fab26270faf9a39591d2e

commit ec1b1467f6934c36b3fd03b01928c1654c57b4e4
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:27Z

    [rebase] [Quorum TLS] clean up UnifiedServerSocketTest and make it use X509TestContext
    Summary: Use X509TestContext to create the cert and key files in UnifiedSocketTest, plus
other minor cleanup.
    Test Plan: ran UnifiedSocketTest in IntelliJ
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9174713
    Tasks: T31626216
    Signature: 9174713:1534362831:b526048e672c9893a4046d0ff5e9df1d8cde4adc

commit d11d85cf2d3b883e5ade7029336521c77832650c
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:28Z

    [rebase] [Quorum TLS] Add support for parsing PEM encoded keys/certs
    Summary: See title.
    Test Plan: Tested it manually. Will add automated tests later.
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, bunn, dain, dongw, nedelchev, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D8906740
    Tasks: T31626216
    Signature: 8906740:1534362912:b52885e9c0f9957b0a911697e976bf153adf6197

commit 210bfe7b75429668331fc14c383f0d1c9ae151b4
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:29Z

    [rebase] [Quorum TLS] Added FileChangeWatcher class and unit test
    Summary: The FileChangeWatcher class provides a simple API to watch a local directory
for file changes, backed by WatchService from the Java standard library. This will allow us
to implement cert store / trust store reloading on changes in a later diff.
    Test Plan: added a new unit test, made sure it passes
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, bunn, dongw, nedelchev, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D8906830
    Tasks: T31626216
    Signature: 8906830:1533162316:679eceab34d30955a582ef2dde8c23822af9555d

commit b2c8593b3c9a7adca79600b1c6f9bb8946f862a0
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:30Z

    [rebase] [Quorum TLS] enable cert store / trust store reloading
    Summary: Enable the reloading of cert store / trust store files.
    Test Plan: TBD.
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, bunn, dongw, nedelchev, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D8906914
    Tasks: T31626216
    Signature: 8906914:1533600372:b254376732d94e74555160d25054a563a5378bc2

commit 9c70cc93c24d2bc7ed6af9ebd8464262aab625e3
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:31Z

    [rebase] [Quorum TLS] use UnifiedServerSocket even w/o port unification to allow cert
    Summary: Added an option to UnifiedServerSocket to block insecure connections. This allows
us to implement cert store / trust store reloading even when port unification is off, because
the conversion to SSL socket is delayed until after the connection is made. This lets us use
an updated SSL context if the file on disk changed.
    Test Plan: Modified UnifiedServerSocketTest
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, bunn, dongw, nedelchev, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D8906965
    Tasks: T31626216
    Signature: 8906965:1533770631:379211c3714ef8fe958012ed9584b64ecf0c4355

commit f5f54afce39bcc0f4df346bc64e4fca0916b53dd
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-20T23:22:33Z

    [rebase] [Quorum TLS] WIP: add denial-of-service-attack resistance to UnifiedServerSocket
    Added resistance to simple denial-of-service attacks to UnifiedServerSocket.
    The attack in question is simple: a client connects to the UnifiedServerSocket and does
    Because the old code did a blocking read() in the accept() method, this would hang the
accept thread
    and prevent any other clients from connecting.
    The solution is to write a new socket type (`UnifiedServerSocket.UnifiedSocket`), which
doesn't know
    if it's a SSLSocket or plaintext socket at the time of creation. Trying to read or write
any data on
    the socket (by calling `getInputStream()`, `getOutputStream()`, or `sendUrgentData()`)
will cause
    the socket to peek at the first 5 bytes of the input stream to try and determine if it
    should switch to TLS or stay as plaintext, using the previous logic. Changed `UnifiedServerSocket.accept()`
    to return the new socket type.
    Added new test cases to `UnifiedServerSocketTest` which check for DoS resistance. Had
to make changes
    to the test to make `UnifiedServerThread` more closely resemble the behavior in Leader.java
(`accept()` on one
    thread, do all reads and writes on another thread). I verified that the old version of
the code (which blocked in
    `UnifiedServerSocket.accept()`) fails the new tests.
    Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java`
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9327931
    Tasks: T31626216
    Signature: 9327931:1534364537:832d8f589a70f0ceb8f4ae8faa1c3cbb4b42fd40

commit 359546a321cff748c5bd40887b45baa6ecc3e9ed
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-29T16:02:01Z

    [Quorum TLS] allow keystore/truststore password properties to be omitted if the password
is empty
    Summary: Currently, the keystore password property must be specified if the keystore location
is. This is clunky, and I bet the common case will be key stores not protected by a password
anyway (if your key password has to be in plaintext in a zoo.cfg file, it's not adding any
security to your key store file anyway). This change allows the keystore password property
to be omitted when there is no password on the keystore.
    Test Plan: `ant -Dtestcase=X509UtilTest test-core-java`
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9374038
    Tasks: T31626216
    Signature: 9374038:1534532478:12253e0252ef69b95b2e8d7e86e96f7f988771a3

commit 33e9538f14e3264612c27e1e39b13ce748d63981
Author: Ilya Maykov <ilyam@...>
Date:   2018-08-29T16:02:01Z

    [Quorum TLS] add config options for more TLS settings and preserve them alongside SSLContext
in X509Util
    Added a wrapper for the SSL Context + options that have to be set on the socket and not
on the context, and made all such options configurable via properties. The options are:
    - default TLS protocol to use
    - enabled protocols to potentially downgrade to if a client doesn't support our protocol
of choice
    - enabled ciper suites
    - client auth mode (NONE / WANT / NEED)
    Test Plan: sandcastle
    Reviewers: nixon, bunn, nedelchev
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com, ssl-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9526137
    Tasks: T31626216
    Signature: 9526137:1535491578:4624f336d24c0cd737697e95b08d5a157e4abf5a

commit f2cb5c39ffdff91791e0f22680bc7bb9b889bfbd
Author: Brian Nixon <nixon@...>
Date:   2018-09-01T02:57:53Z

    remove the dependency on httpclient of org.apache.httpcomponents
    This dependency is causing a performance regression when the perf_regression_version_write
test is run against zeus.regional.atn.10001.
    https://fb.facebook.com/groups/1837468356490057/permalink/2184210251815864/ has more context.
    Test Plan: Built the jar and am running perf_regression_version_write against it on zeus.regional.atn.10001
    Reviewers: ilyam, allenlyu
    Reviewed By: ilyam, allenlyu
    Subscribers: nixon, dongw, #zeus, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9622408
    Tasks: T31626216, T33493338
    Signature: 9622408:1535761773:ae82b939c8cdbf9b4ab0de3eb6e3b7a598a0a6ab

commit 956639dfceb2cb4ce67cac7b93abf668da4b89ce
Author: Ilya Maykov <ilyam@...>
Date:   2018-09-08T01:04:23Z

    [Quorum TLS] fix bugs in UnifiedServerSocket accept() flow
    There were a few issues in UnifiedServerSocket accept() flow that were exposed during
    - if a client closed the connection without sending any data, the accept thread would
crash because a negative return from read() was not properly detected
    - a client connects and doesn't send any data but keeps the connection open, the accept
thread will not make progress. Fixed by adding a new timeout property for the initial read
that determines the client mode, defaults to 5000ms.
    - Exceptions in Leader.LearnerCnxAcceptor.run() could leak an open socket if the error
happened before the socket was given to the LearnerHandler.
    Test Plan: compiles, passes a couple unit tests, will let sandcastle run the full test
    Reviewers: nixon, bunn, allenlyu
    Reviewed By: bunn
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9694977
    Tasks: T31626216
    Signature: 9694977:1536281695:fe2ece2be1277513de727fb3130e9d952e33f2f4

commit 0ed15965f03794c95f04d30e7d93b0a531ee3ebe
Author: Ilya Maykov <ilyam@...>
Date:   2018-09-08T01:04:24Z

    [Quorum TLS] delay unified socket mode detection to the first read()/write() operation
    Summary: Instead of reading from the underlying socket and detecting the mode as soon
as an input/output stream is requested from a unified socket, delay the blocking mode detection
until the first actual read()/write() on the requested stream. This means an accept() thread
that gets the input stream but doesn't actually read from it and passes the stream to a worker
thread will not get blocked.
    Test Plan: Modified UnifiedServerSocketTest to grab the input stream in the main accept()
thread, make sure it passes with the changes. The modified tests fail as expected if the logic
in UnifiedServerSocket.java is reverted.
    Reviewers: nixon, bunn, allenlyu
    Reviewed By: nixon
    Subscribers: nixon, dongw, zeus-diffs@fb.com
    Differential Revision: https://phabricator.intern.facebook.com/D9696594
    Tasks: T31626216
    Signature: 9696594:1536343729:00605b4393e61f340d2d079ce00027e2e11489d0



View raw message