From dev-return-73185-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Fri Sep 14 22:09:13 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 886C1180647 for ; Fri, 14 Sep 2018 22:09:12 +0200 (CEST) Received: (qmail 44336 invoked by uid 500); 14 Sep 2018 20:09:11 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 44325 invoked by uid 99); 14 Sep 2018 20:09:10 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2018 20:09:10 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 9D381E004F; Fri, 14 Sep 2018 20:09:10 +0000 (UTC) From: ivmaykov To: dev@zookeeper.apache.org Reply-To: dev@zookeeper.apache.org Message-ID: Subject: [GitHub] zookeeper pull request #627: ZOOKEEPER-236: SSL Support for Atomic Broadcast... Content-Type: text/plain Date: Fri, 14 Sep 2018 20:09:10 +0000 (UTC) GitHub user ivmaykov opened a pull request: https://github.com/apache/zookeeper/pull/627 ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol [part 2] # Overview These are fixes and improvements to #184 that we've coded up at Facebook while testing that PR internally. This is meant to be in addition to #184, and should be stacked on top of it (and in fact includes the commits from that pull request). Most notable changes: ## Fixed networking issues/bugs in UnifiedServerSocket - don't crash the accept() thread if the client closes the connection without sending any data - don't corrupt the connection if the client sends smaller than 5 bytes for the initial read - delay the detection of TLS vs. plaintext mode until a socket stream is read from or written to, for example - prepending 5 bytes to `PrependableSocket` and then trying to read >5 bytes would only return the first 5 bytes, even if more bytes were available. This is fixed. ## Added support for PEM formatted key stores and trust stores - key store and trust store files can now be in PEM format as well as JKS. - Added config properties to tell ZK what type of trust/key store to load: - `zookeeper.ssl.keyStore.type` and `zookeeper.ssl.trustStore.type` for ClientX509Util - `zookeeper.ssl.quorum.keyStore.type` and `zookeeper.ssl.quorum.trustStore.type` for QuorumX509Util - store type properties could have the values "JKS", "PEM", or not set - leaving the type properties unset will cause auto-detection of the store type based on the file extension (".jks" or ".pem") ## Added support for reloading key/trust stores when the file on disk changes - new property `sslQuorumReloadCertFiles` which controls the behavior for reloading the key and trust store files for `QuorumX509Util`. Reloading of key and trust store for `ClientX509Util` is not in this PR but could be added easily - this allows a ZK server to keep running on a machine that uses short-lived certs that refresh frequently without having to restart the ZK process. ## Added test utilities for easily creating X509 certs and using them in unit tests - added new class `X509TestContext` and its friend, `X509TestHelpers` - rewrote some existing unit tests to use these classes, and added new tests that use them - some existing tests (i.e. `QuorumSSLTest`) should probably be ported to use this as well, haven't got around to it yet ## Added more options for ssl settings to X509Util and encapsulate them better - previously, some SSL settings came from a ZKConfig and others came from global `System.getProperties()`. This made it hard to isolate certain settings in tests. - now all SSL-related settings come from the ZKConfig object used to create the context - new settings added: - `zookeeper.ssl(.quorum).enabledProtocols` - list of enabled protocols. If not set, defaults to a single-entry list with the value of `zookeeper.ssl(.quorum).protocol`. - `zookeeper.ssl(.quorum).clientAuth` - can be "NONE", "WANT", or "NEED". This controls whether the server doesn't want / allows / requires the client to present an X509 certificate. - `zookeeper.ssl(.quorum).handshakeDetectionTimeoutMillis` - timeout for the first read of 5 bytes to detect the transport mode (TLS or plaintext) of a client connection made to a `UnifiedServerSocket` ## Fixed odd plaintext perf regression caused by new dependency on `org.apache.httpcomponents` - internal regression testing showed a >10% perf degradation in plaintext mode that was tracked down to the addition of the `org.apache.httpcomponents` dependency. - Removing the dependency and implementing the hostname verification by mostly copy-pasting the hostname verification code from `httpcomponents` fixed the regression. There may be some other changes that I'm forgetting, but those are the main ones. We believe that #184 should not be accepted without this PR coming close behind, mainly because of the issues with UnifiedServerSocket. We were able to put clusters into a bad state pretty easily simply by having a participant disconnect immediately after connecting to the quorum port. This crashed the accept thread in the Leader and prevented any other participants from connecting. This could happen due to network hick-ups, etc. Ideally, both PRs get +1s around the same time and can land with a small delay one after another. We have started testing internal builds with #184 and the code in this PR. Quorums with TLS disabled have shown no perf or stability regressions, so we believe it's safe to merge this code to master as it's purely opt-in at this point, gated behind config options. We've done correctness testing of quorums with TLS enabled, but have not done perf testing to measure the impact of enabling quorum TLS yet. That should happen in the near future, and I will update this PR with our perf test results once we have them. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ivmaykov/zookeeper fb-tls Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/627.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #627 ---- commit 88b61716d26f6b3c11325367a075bb062dc4147c Author: Andor Molnar Date: 2018-04-03T19:24:49Z ZOOKEEPER-236: SSL Support for Atomic Broadcast protocol commit c452d1b036bb3c9cd53478254129474ad4e387ea Author: Andor Molnar Date: 2018-05-18T00:09:38Z ZOOKEEPER-236. Fixed unit test + added some extra debug logging commit ed10e88de70aa97f12f586d0c52b63a7d9a4cbae Author: Andor Molnar Date: 2018-05-18T16:58:40Z ZOOKEEPER-236. Added cipher suite to test to run on CentOS. Timeout in constant value. Null checks commit 9ab476a7a4d345a94f174bcd831a350ec92c94f9 Author: Andor Molnar Date: 2018-05-20T20:09:09Z ZOOKEEPER-236. Trying to fix cipher suites test by changing the default protocol to TLSv1.2 and filter suitable cipher suites commit d64eb26f75574f118d29cffec11d4116b469c36a Author: Andor Molnar Date: 2018-06-11T15:09:58Z ZOOKEEPER-236. Code review related changes: - server & client hostname verification can be set independently, - refactor defaultSSLContext to use AtomicReference, - some minor nitpicks commit e8a1729794c8044f8fbde5c8f304cd1bd3648836 Author: Andor Molnar Date: 2018-06-11T15:58:12Z ZOOKEEPER-236. Reverted to use single property for hostname verification commit 777f31ac2dba38f2315dfdbfa47cdcb3832e8a6f Author: Andor Molnar Date: 2018-06-14T15:37:58Z ZOOKEEPER-236. Added Java8/Java9 default cipher suites commit a9fa698131d36670b11973b8823efd11afa1acb5 Author: Andor Molnar Date: 2018-06-15T11:57:46Z ZOOKEEPER-236. Code review fixes: - Added comment to clarify default enabled cipher suites, - Use java.specification.version to determine JDK version, - Use port unification only when SSL quorum is enabled, - Unit test fixes. commit 1f8aab05bf45a5c6c40eb30c84c8a7f95a7fa27e Author: Andor Molnar Date: 2018-06-15T13:04:31Z ZOOKEEPER-236. Revert portUnification/sslQuorum logic commit 209fbca78c083a1c06bee8d463ba77a86a3ee1ee Author: Andor Molnar Date: 2018-06-22T14:38:17Z ZOOKEEPER-236. Added new JMX properties to expose SSL quorum related settings commit d78e1146a8a35f6d2f9d75fd9b769ff8a423540c Author: Ilya Maykov Date: 2018-08-20T23:22:23Z [rebase] [Quorum TLS] Fix bugs in PrependableSocket and UnifiedServerSocket Summary: Fixed some bugs in PrependableSocket and UnifiedServerSocket - using SequenceInputStream in PrependableSocket broke the read() behavior when trying to read > 5 bytes - changed it to use PushbackInputStream which behaves as expected - changed UnifiedSocketTest to read more than 5 bytes from the unified socket to make sure the fix works - call setNeedClientAuth(true) in UnifiedServerSocket.accept() when connection is TLS. This is to have parity with X509Util.configureSSLServerSocket() which sets the same behavior for strict server-side SSL sockets. - Use the Java8 API for creating a secure socket when some bytes have already been read. I think this allows us to avoid the overhead of PushbackInputStream in the SSL case. - added "localhost" to the connect address of client socket in UnifiedServerSocket test. The test didn't work for me otherwise. Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java` Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9234843 Tasks: T31626216 Signature: 9234843:1533776414:92d829ba49f997931345e7d90b4a2183b666f294 commit ef2db825516bf0a20d2a62a6f1c3449f543a15ca Author: Ilya Maykov Date: 2018-08-20T23:22:24Z [rebase] [Quorum TLS] get SSL protocol from ZKConfig instead of System properties Summary: See title. This makes this property consistent with all the others that we already get through the ZKConfig. Test Plan: `ant -Dtestcase=X509UtilTest test-core-java` Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com, ssl-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9236333 Tasks: T31626216 Signature: 9236333:1533836898:5d1c734d96b807a5f6a39f863661669abefe11aa commit 8682c3b48880831f70e918ddf53ea4275fa08cb2 Author: Ilya Maykov Date: 2018-08-20T23:22:25Z [rebase] [Quorum TLS] added X509TestContext and supporting code Summary: Added a few new classes that will make testing security related code much easier. - X509TestContext: easy way to manage certs and private keys in unit tests, to be used with SSL sockets - X509TestHelpers: static functions for creating certs and key pairs, and saving them to JKS or PEM files on disk - X509KeyType: enum for asymmetric key types Test Plan: Bunch of tests in future diffs will rely on it Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9174703 Tasks: T31626216 Signature: 9174703:1533597616:330f1ea9db736c50825d6eb084e20af91b8fd782 commit fa08a806c59f7cc6e0f46686b6b388bfe07884e9 Author: Ilya Maykov Date: 2018-08-20T23:22:26Z [rebase] [Quorum TLS] update X509UtilTest to use X509TestContext and parameterize the test Summary: See title. Now X509UtilTest has better coverage of different combinations of key types (RSA, EC) and password protected / unprotected key store. Also added some new test cases and removed redundant ones. Test Plan: ran X509UtilTest from IntelliJ Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9174711 Tasks: T31626216 Signature: 9174711:1533766713:efa901b625d079cf6f2fab26270faf9a39591d2e commit ec1b1467f6934c36b3fd03b01928c1654c57b4e4 Author: Ilya Maykov Date: 2018-08-20T23:22:27Z [rebase] [Quorum TLS] clean up UnifiedServerSocketTest and make it use X509TestContext Summary: Use X509TestContext to create the cert and key files in UnifiedSocketTest, plus other minor cleanup. Test Plan: ran UnifiedSocketTest in IntelliJ Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9174713 Tasks: T31626216 Signature: 9174713:1534362831:b526048e672c9893a4046d0ff5e9df1d8cde4adc commit d11d85cf2d3b883e5ade7029336521c77832650c Author: Ilya Maykov Date: 2018-08-20T23:22:28Z [rebase] [Quorum TLS] Add support for parsing PEM encoded keys/certs Summary: See title. Test Plan: Tested it manually. Will add automated tests later. Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, bunn, dain, dongw, nedelchev, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D8906740 Tasks: T31626216 Signature: 8906740:1534362912:b52885e9c0f9957b0a911697e976bf153adf6197 commit 210bfe7b75429668331fc14c383f0d1c9ae151b4 Author: Ilya Maykov Date: 2018-08-20T23:22:29Z [rebase] [Quorum TLS] Added FileChangeWatcher class and unit test Summary: The FileChangeWatcher class provides a simple API to watch a local directory for file changes, backed by WatchService from the Java standard library. This will allow us to implement cert store / trust store reloading on changes in a later diff. Test Plan: added a new unit test, made sure it passes Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, bunn, dongw, nedelchev, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D8906830 Tasks: T31626216 Signature: 8906830:1533162316:679eceab34d30955a582ef2dde8c23822af9555d commit b2c8593b3c9a7adca79600b1c6f9bb8946f862a0 Author: Ilya Maykov Date: 2018-08-20T23:22:30Z [rebase] [Quorum TLS] enable cert store / trust store reloading Summary: Enable the reloading of cert store / trust store files. Test Plan: TBD. Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, bunn, dongw, nedelchev, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D8906914 Tasks: T31626216 Signature: 8906914:1533600372:b254376732d94e74555160d25054a563a5378bc2 commit 9c70cc93c24d2bc7ed6af9ebd8464262aab625e3 Author: Ilya Maykov Date: 2018-08-20T23:22:31Z [rebase] [Quorum TLS] use UnifiedServerSocket even w/o port unification to allow cert reloading Summary: Added an option to UnifiedServerSocket to block insecure connections. This allows us to implement cert store / trust store reloading even when port unification is off, because the conversion to SSL socket is delayed until after the connection is made. This lets us use an updated SSL context if the file on disk changed. Test Plan: Modified UnifiedServerSocketTest Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, bunn, dongw, nedelchev, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D8906965 Tasks: T31626216 Signature: 8906965:1533770631:379211c3714ef8fe958012ed9584b64ecf0c4355 commit f5f54afce39bcc0f4df346bc64e4fca0916b53dd Author: Ilya Maykov Date: 2018-08-20T23:22:33Z [rebase] [Quorum TLS] WIP: add denial-of-service-attack resistance to UnifiedServerSocket Summary: Added resistance to simple denial-of-service attacks to UnifiedServerSocket. The attack in question is simple: a client connects to the UnifiedServerSocket and does nothing. Because the old code did a blocking read() in the accept() method, this would hang the accept thread and prevent any other clients from connecting. The solution is to write a new socket type (`UnifiedServerSocket.UnifiedSocket`), which doesn't know if it's a SSLSocket or plaintext socket at the time of creation. Trying to read or write any data on the socket (by calling `getInputStream()`, `getOutputStream()`, or `sendUrgentData()`) will cause the socket to peek at the first 5 bytes of the input stream to try and determine if it should switch to TLS or stay as plaintext, using the previous logic. Changed `UnifiedServerSocket.accept()` to return the new socket type. Added new test cases to `UnifiedServerSocketTest` which check for DoS resistance. Had to make changes to the test to make `UnifiedServerThread` more closely resemble the behavior in Leader.java (`accept()` on one thread, do all reads and writes on another thread). I verified that the old version of the code (which blocked in `UnifiedServerSocket.accept()`) fails the new tests. Test Plan: `ant -Dtestcase=UnifiedServerSocketTest test-core-java` Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9327931 Tasks: T31626216 Signature: 9327931:1534364537:832d8f589a70f0ceb8f4ae8faa1c3cbb4b42fd40 commit 359546a321cff748c5bd40887b45baa6ecc3e9ed Author: Ilya Maykov Date: 2018-08-29T16:02:01Z [Quorum TLS] allow keystore/truststore password properties to be omitted if the password is empty Summary: Currently, the keystore password property must be specified if the keystore location is. This is clunky, and I bet the common case will be key stores not protected by a password anyway (if your key password has to be in plaintext in a zoo.cfg file, it's not adding any security to your key store file anyway). This change allows the keystore password property to be omitted when there is no password on the keystore. Test Plan: `ant -Dtestcase=X509UtilTest test-core-java` Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9374038 Tasks: T31626216 Signature: 9374038:1534532478:12253e0252ef69b95b2e8d7e86e96f7f988771a3 commit 33e9538f14e3264612c27e1e39b13ce748d63981 Author: Ilya Maykov Date: 2018-08-29T16:02:01Z [Quorum TLS] add config options for more TLS settings and preserve them alongside SSLContext in X509Util Summary: Added a wrapper for the SSL Context + options that have to be set on the socket and not on the context, and made all such options configurable via properties. The options are: - default TLS protocol to use - enabled protocols to potentially downgrade to if a client doesn't support our protocol of choice - enabled ciper suites - client auth mode (NONE / WANT / NEED) Test Plan: sandcastle Reviewers: nixon, bunn, nedelchev Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com, ssl-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9526137 Tasks: T31626216 Signature: 9526137:1535491578:4624f336d24c0cd737697e95b08d5a157e4abf5a commit f2cb5c39ffdff91791e0f22680bc7bb9b889bfbd Author: Brian Nixon Date: 2018-09-01T02:57:53Z remove the dependency on httpclient of org.apache.httpcomponents Summary: This dependency is causing a performance regression when the perf_regression_version_write test is run against zeus.regional.atn.10001. https://fb.facebook.com/groups/1837468356490057/permalink/2184210251815864/ has more context. Test Plan: Built the jar and am running perf_regression_version_write against it on zeus.regional.atn.10001 Reviewers: ilyam, allenlyu Reviewed By: ilyam, allenlyu Subscribers: nixon, dongw, #zeus, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9622408 Tasks: T31626216, T33493338 Signature: 9622408:1535761773:ae82b939c8cdbf9b4ab0de3eb6e3b7a598a0a6ab commit 956639dfceb2cb4ce67cac7b93abf668da4b89ce Author: Ilya Maykov Date: 2018-09-08T01:04:23Z [Quorum TLS] fix bugs in UnifiedServerSocket accept() flow Summary: There were a few issues in UnifiedServerSocket accept() flow that were exposed during testing: - if a client closed the connection without sending any data, the accept thread would crash because a negative return from read() was not properly detected - a client connects and doesn't send any data but keeps the connection open, the accept thread will not make progress. Fixed by adding a new timeout property for the initial read that determines the client mode, defaults to 5000ms. - Exceptions in Leader.LearnerCnxAcceptor.run() could leak an open socket if the error happened before the socket was given to the LearnerHandler. Test Plan: compiles, passes a couple unit tests, will let sandcastle run the full test suite Reviewers: nixon, bunn, allenlyu Reviewed By: bunn Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9694977 Tasks: T31626216 Signature: 9694977:1536281695:fe2ece2be1277513de727fb3130e9d952e33f2f4 commit 0ed15965f03794c95f04d30e7d93b0a531ee3ebe Author: Ilya Maykov Date: 2018-09-08T01:04:24Z [Quorum TLS] delay unified socket mode detection to the first read()/write() operation Summary: Instead of reading from the underlying socket and detecting the mode as soon as an input/output stream is requested from a unified socket, delay the blocking mode detection until the first actual read()/write() on the requested stream. This means an accept() thread that gets the input stream but doesn't actually read from it and passes the stream to a worker thread will not get blocked. Test Plan: Modified UnifiedServerSocketTest to grab the input stream in the main accept() thread, make sure it passes with the changes. The modified tests fail as expected if the logic in UnifiedServerSocket.java is reverted. Reviewers: nixon, bunn, allenlyu Reviewed By: nixon Subscribers: nixon, dongw, zeus-diffs@fb.com Differential Revision: https://phabricator.intern.facebook.com/D9696594 Tasks: T31626216 Signature: 9696594:1536343729:00605b4393e61f340d2d079ce00027e2e11489d0 ---- ---