zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gracia, Raul" <Raul.Gra...@Dell.com>
Subject RE: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6)
Date Thu, 16 May 2019 15:10:07 GMT
Thanks Andor for your quick reply. Let me answer to your questions:

1) Yes, the problem is related to client/server communication using SSL, not related to Quorum
SSL (we use a single Zookeeper process in our tests). I would like your feedback first to
conclude if this is a problem in our config/code or a regression/change in the behavior of
Zookeeper 3.5.5. 

2) Yes, with the external Zookeeper server running separately (e.g., zkServer.sh start) all
the tests are passing (SSL/non-SSL). With the Zookeeper server process we instantiate in our
tests, the non-SSL tests are also passing, but not the SSL ones.

3) Correct. Just to give more detail here, we are instantiating the Zookeeper server process
using the ZooKeeperServer class jointly with NettyServerCnxnFactory.

4) I have done 2 types of tests: with Zookeeper started as a separate service ("zkServer.sh")
and using the Zookeeper server process we instantiate in Pravega standalone tests (namely,
- zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and the Pravega standalone
tests pass using it with/without SSL.
- zk-pravega-tests: Without SSL, the zkCli.sh can connect to that process and the non-SSL
Pravega tests pass. With SSL configured, neither zkCli.sh nor Pravega tests with SSL are capable
to connect to the server (KeeperErrorCode = ConnectionLoss).

5) No, I haven't tested this scenario yet. I have tested a standalone Zookeeper server (zkServer.sh)
and a client (zkCli.sh) with SSL enabled in the same machine, and it works well. Apart from
that, I have also performed distributed tests with a Zookeeper server (3.5.4-beta) and Pravega
(using Curator 4.0.1 + zookeeper-3.5.5) in Kubernetes and it worked fine.

6) Yes, in fact I have done a little more than that and I have created a repository to investigate
this issue in isolation: https://github.com/RaulGracia/zookeeper-test
Apart from providing logs (see logs folder), in this repo I extracted the piece of code from
the Pravega repository that is used to start the Zookeeper standalone process, making it easier
to configure the SSL properties via executable. I think that this will make it easier for
anyone to reproduce the problem I'm experiencing. Moreover, I have provided instructions in
the README file on how to reproduce the issue.

Thanks a lot,

-----Original Message-----
From: Andor Molnar <andor@cloudera.com.INVALID> 
Sent: Thursday, May 16, 2019 11:18 AM
To: DevZooKeeper
Subject: Re: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release
3.5.5 candidate 6)


Hi Raul,

Thanks for the analysis. Let me ask a few questions, because I see some things that need to
be clarified first.

1. This issue is only about server-client SSL scenario (not Quorum TLS), so it's possibly
a regression in 3.5. Is that correct?
2. When running all Pravega tests against an external ZooKeeper standalone server, all tests
passed including SSL/nonSSL. Is that correct?
3. SSL tests are failing when ZooKeeper is running inside the test process?
4. You verified it by running ZooKeeper in standalone mode, SSL-enabled and according to the
log snippet, your client has connected successfully, but later timed out. Is that right?
5. Have you verified client-server SSL config with real (3-node) cluster with zkCli.sh?
6. Would you please provide the server side logs as well, maybe it sheds some light why the
client timed out?


On Thu, May 16, 2019 at 10:25 AM Gracia, Raul <Raul.Gracia@dell.com> wrote:

> Hi all,
> My name is Raúl Gracia and I work in the Pravega project (open-source 
> project for data stream storage): http://pravega.io/.
> I'm currently working on a Pravega branch using "zookeeper-3.5.5-rc6", 
> as we are interested on allowing Curator (4.0.1) to use a Zookeeper 
> version with the bugfix proposed in ZOOKEEPER-2184< 
> https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The integration 
> has been pretty smooth and 99% of tests are successful in a Pravega 
> build, and the original issue that motivated the upgrade to 
> zookeeper-3.5.5 seems also solved.
> However, there are failures related to a specific type of tests in 
> Pravega in which we instantiate a Zookeeper server process (for 
> testing Pravega in standalone mode). Such failures only occur when 
> running the standalone tests with SSL enabled, which includes 
> configuring the Zookeeper server process with SSL as well.
> To constrain the scope of the problem, I have built 
> zookeeper-3.5.5-rc6 ("mvn package") and executed the server (e.g., 
> "./bin/zkServer.sh start") with the appropriate security configuration to enable SSL:
> -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerC
> nxnFactory -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
> -Dzookeeper.ssl.keyStore.password=password
> -Dzookeeper.ssl.trustStore.location=.../client.truststore.jks
> -Dzookeeper.ssl.trustStore.password= password"
> (I have also added secureClientPort=2281 in zoo.cfg as indicated in 
> the admin instructions)
> With the Zookeeper server running separately, I executed all the 
> Pravega standalone tests (with and without SSL) pointing that external 
> Zookeeper server (and disabling the Zookeeper server process that was 
> created as part of the test workflow). Regarding configuration, in our 
> tests the clients are configured with the recommended security 
> settings in the administration
> guide:
> System.setProperty("zookeeper.client.secure", "true"); 
> System.setProperty("zookeeper.clientCnxnSocket",
> "org.apache.zookeeper.ClientCnxnSocketNetty");
> System.setProperty("zookeeper.ssl.trustStore.location",
> .../client.truststore.jks");
> System.setProperty("zookeeper.ssl.trustStore.password", "password "); 
> System.setProperty("zookeeper.ssl.keyStore.location",
> ".../server.keystore.jks");
> System.setProperty("zookeeper.ssl.keyStore.password", "password ");
> In this case, all the Pravega standalone tests succeeded.
> This leaves us the way we are configuring SSL in the Zookeeper server 
> process in Pravega standalone as the most plausible cause for the problem.
> This is intriguing, as the security settings used are the same in both 
> scenarios (zkServer.sh / Zookeeper server process started in the test code).
> I have also confirmed this by running the Zookeeper server process 
> used in standalone with/without SSL and connecting to it via the 
> zkCli. Without SSL configured I can connect properly to it, whereas 
> with SSL enabled I get the following error in the client:
> 2019-05-15 19:59:40,479 [myid:] - INFO  [main:ZooKeeper@868] - 
> Initiating client connection, connectString=localhost:2281 
> sessionTimeout=30000
> watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto:
> watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1>
> 2019-05-15 19:59:40,507 [myid:] - INFO  [main:X509Util@79] - Setting 
> -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable 
> client-initiated TLS renegotiation
> 2019-05-15 19:59:40,791 [myid:] - INFO  [main:ClientCnxnSocket@237] - 
> jute.maxbuffer value is 4194304 Bytes
> 2019-05-15 19:59:40,798 [myid:] - INFO  [main:ClientCnxn@1653] - 
> zookeeper.request.timeout value is 0. feature enabled=
> 2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO 
> [main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - Opening 
> socket connection to server localhost/ Will not attempt 
> to authenticate using SASL (unknown error) Welcome to ZooKeeper!
> JLine support is enabled
> [zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168 
> [myid:localhost:2281] - INFO 
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFactory
> @460]
> - SSL handler added for channel: [id: 0x7bf11dfa]
> 2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO 
> [epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket 
> connection established, initiating session, client: /, server:
> localhost/
> 2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO 
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is
> connected: [id: 0x7bf11dfa, L:/ - R:localhost/ 
> 2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO 
> [epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session 
> establishment complete on server localhost/, sessionid = 
> 0x10002239ae10000, negotiated timeout = 30000
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2281(CONNECTED) 0] ls /
> 2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN 
> [main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] - Client 
> session timed out, have not heard from server in 20004ms for sessionid
> 0x10002239ae10000
> 2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO 
> [main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] - Client 
> session timed out, have not heard from server in 20004ms for sessionid 
> 0x10002239ae10000, closing socket connection and attempting reconnect
> 2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO 
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473] - 
> channel is disconnected: [id: 0x7bf11dfa, L:/ !
> R:localhost/]
> 2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO 
> [epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is told 
> closing KeeperErrorCode = ConnectionLoss for /
> [zk: localhost:2281(CONNECTED) 1]
> I see some suspicious messages in these logs that I will need to 
> investigate further. But as a general observation, it looks like the 
> way we instantiate the Zookeeper server process for Pravega standalone 
> is not valid in zookeeper-3.5.5-rc6 (to inspect how we create the 
> Zookeeper server process, please see methods initialize() and start() 
> in this file< 
> https://github.com/pravega/pravega/blob/master/segmentstore/storage/im
> pl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/ZooKe
> eperServiceRunner.java
> >).
> In summary, if the error I'm getting is related to changes in the SSL 
> configuration introduced in zookeeper-3.5.5, it would be great to get 
> feedback from you if I'm missing something. On the other hand, if the 
> way we are creating a Zookeeper server process is not the recommended 
> one, I'm also open to suggestions here.
> Thanks in advance and sorry for the long email, Raúl.
> PS: I have also tried to run the Zookeeper server process with SSL 
> forcing to only use the netty and boringSSL library versions that are 
> used either in Pravega(netty*:4.1.30.Final, 
> netty-tcnative-boringssl-static:2.0.17) or Zookeeper 
> 3.5.5(netty*:4.1.29.Final, netty-tcnative-boringssl-static:2.0.7), but 
> none of these combinations made any difference in the behavior of the Zookeeper server
> PS2: The JDK version I use is: openjdk version "1.8.0_212".
View raw message