zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@cloudera.com.INVALID>
Subject Re: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6)
Date Thu, 16 May 2019 16:42:58 GMT
Hi Raul,

X509AuthenticationProvider is not registered in the embedded ZK. In server
logs it says:
"[epollEventLoopGroup-4-1] ERROR
org.apache.zookeeper.server.NettyServerCnxnFactory - Auth provider not
found: x509"

It's done by QuorumPeerConfig.java:436 (configureSSLAuth()) when you run
ZooKeeper in standalone mode, but your code doesn't use this configuration
class at all.
If you add this:

System.setProperty("zookeeper.authProvider.x509",
"org.apache.zookeeper.server.auth.X509AuthenticationProvider");

to your initialize() method, client SSL works:

[nioEventLoopGroup-4-2] INFO
org.apache.zookeeper.server.NettyServerCnxnFactory - SSL handler added for
channel: [id: 0x698604a3, L:/127.0.0.1:2281 - R:/127.0.0.1:56750]
[nioEventLoopGroup-4-2] INFO
org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated
Id 'CN=server.pravegastack.io' for Scheme 'x509'

TBH I haven't diffed the code with 3.5.4-beta, so not sure why it worked
previously and I don't have experience with embedded ZK, but I believe
QuorumPeerConfig class has to be involved somehow.

Regards,
Andor



On Thu, May 16, 2019 at 5:10 PM Gracia, Raul <Raul.Gracia@dell.com> wrote:

> Thanks Andor for your quick reply. Let me answer to your questions:
>
> 1) Yes, the problem is related to client/server communication using SSL,
> not related to Quorum SSL (we use a single Zookeeper process in our tests).
> I would like your feedback first to conclude if this is a problem in our
> config/code or a regression/change in the behavior of Zookeeper 3.5.5.
>
> 2) Yes, with the external Zookeeper server running separately (e.g.,
> zkServer.sh start) all the tests are passing (SSL/non-SSL). With the
> Zookeeper server process we instantiate in our tests, the non-SSL tests are
> also passing, but not the SSL ones.
>
> 3) Correct. Just to give more detail here, we are instantiating the
> Zookeeper server process using the ZooKeeperServer class jointly with
> NettyServerCnxnFactory.
>
> 4) I have done 2 types of tests: with Zookeeper started as a separate
> service ("zkServer.sh") and using the Zookeeper server process we
> instantiate in Pravega standalone tests (namely, "zk-pravega-tests"):
> - zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and the
> Pravega standalone tests pass using it with/without SSL.
> - zk-pravega-tests: Without SSL, the zkCli.sh can connect to that process
> and the non-SSL Pravega tests pass. With SSL configured, neither zkCli.sh
> nor Pravega tests with SSL are capable to connect to the server
> (KeeperErrorCode = ConnectionLoss).
>
> 5) No, I haven't tested this scenario yet. I have tested a standalone
> Zookeeper server (zkServer.sh) and a client (zkCli.sh) with SSL enabled in
> the same machine, and it works well. Apart from that, I have also performed
> distributed tests with a Zookeeper server (3.5.4-beta) and Pravega (using
> Curator 4.0.1 + zookeeper-3.5.5) in Kubernetes and it worked fine.
>
> 6) Yes, in fact I have done a little more than that and I have created a
> repository to investigate this issue in isolation:
> https://github.com/RaulGracia/zookeeper-test
> Apart from providing logs (see logs folder), in this repo I extracted the
> piece of code from the Pravega repository that is used to start the
> Zookeeper standalone process, making it easier to configure the SSL
> properties via executable. I think that this will make it easier for anyone
> to reproduce the problem I'm experiencing. Moreover, I have provided
> instructions in the README file on how to reproduce the issue.
>
> Thanks a lot,
> Raúl.
>
>
> -----Original Message-----
> From: Andor Molnar <andor@cloudera.com.INVALID>
> Sent: Thursday, May 16, 2019 11:18 AM
> To: DevZooKeeper
> Subject: Re: Question about security configuration (was: Re: [VOTE] Apache
> ZooKeeper release 3.5.5 candidate 6)
>
>
> [EXTERNAL EMAIL]
>
> Hi Raul,
>
> Thanks for the analysis. Let me ask a few questions, because I see some
> things that need to be clarified first.
>
> 1. This issue is only about server-client SSL scenario (not Quorum TLS),
> so it's possibly a regression in 3.5. Is that correct?
> 2. When running all Pravega tests against an external ZooKeeper standalone
> server, all tests passed including SSL/nonSSL. Is that correct?
> 3. SSL tests are failing when ZooKeeper is running inside the test process?
> 4. You verified it by running ZooKeeper in standalone mode, SSL-enabled
> and according to the log snippet, your client has connected successfully,
> but later timed out. Is that right?
> 5. Have you verified client-server SSL config with real (3-node) cluster
> with zkCli.sh?
> 6. Would you please provide the server side logs as well, maybe it sheds
> some light why the client timed out?
>
> Thanks,
> Andor
>
>
>
>
> On Thu, May 16, 2019 at 10:25 AM Gracia, Raul <Raul.Gracia@dell.com>
> wrote:
>
> > Hi all,
> >
> > My name is Raúl Gracia and I work in the Pravega project (open-source
> > project for data stream storage): http://pravega.io/.
> >
> > I'm currently working on a Pravega branch using "zookeeper-3.5.5-rc6",
> > as we are interested on allowing Curator (4.0.1) to use a Zookeeper
> > version with the bugfix proposed in ZOOKEEPER-2184<
> > https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The integration
> > has been pretty smooth and 99% of tests are successful in a Pravega
> > build, and the original issue that motivated the upgrade to
> > zookeeper-3.5.5 seems also solved.
> >
> > However, there are failures related to a specific type of tests in
> > Pravega in which we instantiate a Zookeeper server process (for
> > testing Pravega in standalone mode). Such failures only occur when
> > running the standalone tests with SSL enabled, which includes
> > configuring the Zookeeper server process with SSL as well.
> >
> > To constrain the scope of the problem, I have built
> > zookeeper-3.5.5-rc6 ("mvn package") and executed the server (e.g.,
> > "./bin/zkServer.sh start") with the appropriate security configuration
> to enable SSL:
> > export SERVER_JVMFLAGS="
> >
> > -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerC
> > nxnFactory -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
> > -Dzookeeper.ssl.keyStore.password=password
> > -Dzookeeper.ssl.trustStore.location=.../client.truststore.jks
> > -Dzookeeper.ssl.trustStore.password= password"
> > (I have also added secureClientPort=2281 in zoo.cfg as indicated in
> > the admin instructions)
> >
> > With the Zookeeper server running separately, I executed all the
> > Pravega standalone tests (with and without SSL) pointing that external
> > Zookeeper server (and disabling the Zookeeper server process that was
> > created as part of the test workflow). Regarding configuration, in our
> > tests the clients are configured with the recommended security
> > settings in the administration
> > guide:
> > System.setProperty("zookeeper.client.secure", "true");
> > System.setProperty("zookeeper.clientCnxnSocket",
> > "org.apache.zookeeper.ClientCnxnSocketNetty");
> > System.setProperty("zookeeper.ssl.trustStore.location",
> > .../client.truststore.jks");
> > System.setProperty("zookeeper.ssl.trustStore.password", "password ");
> > System.setProperty("zookeeper.ssl.keyStore.location",
> > ".../server.keystore.jks");
> > System.setProperty("zookeeper.ssl.keyStore.password", "password ");
> >
> > In this case, all the Pravega standalone tests succeeded.
> >
> > This leaves us the way we are configuring SSL in the Zookeeper server
> > process in Pravega standalone as the most plausible cause for the
> problem.
> > This is intriguing, as the security settings used are the same in both
> > scenarios (zkServer.sh / Zookeeper server process started in the test
> code).
> >
> > I have also confirmed this by running the Zookeeper server process
> > used in standalone with/without SSL and connecting to it via the
> > zkCli. Without SSL configured I can connect properly to it, whereas
> > with SSL enabled I get the following error in the client:
> >
> > 2019-05-15 19:59:40,479 [myid:] - INFO  [main:ZooKeeper@868] -
> > Initiating client connection, connectString=localhost:2281
> > sessionTimeout=30000
> > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto:
> > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1>
> > 2019-05-15 19:59:40,507 [myid:] - INFO  [main:X509Util@79] - Setting
> > -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable
> > client-initiated TLS renegotiation
> > 2019-05-15 19:59:40,791 [myid:] - INFO  [main:ClientCnxnSocket@237] -
> > jute.maxbuffer value is 4194304 Bytes
> > 2019-05-15 19:59:40,798 [myid:] - INFO  [main:ClientCnxn@1653] -
> > zookeeper.request.timeout value is 0. feature enabled=
> > 2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO
> > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - Opening
> > socket connection to server localhost/127.0.0.1:2281. Will not attempt
> > to authenticate using SASL (unknown error) Welcome to ZooKeeper!
> > JLine support is enabled
> > [zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168
> > [myid:localhost:2281] - INFO
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFactory
> > @460]
> > - SSL handler added for channel: [id: 0x7bf11dfa]
> > 2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO
> > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket
> > connection established, initiating session, client: /127.0.0.1:52652,
> server:
> > localhost/127.0.0.1:2281
> > 2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is
> > connected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/
> > 127.0.0.1:2281]
> > 2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO
> > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session
> > establishment complete on server localhost/127.0.0.1:2281, sessionid =
> > 0x10002239ae10000, negotiated timeout = 30000
> > WATCHER::
> > WatchedEvent state:SyncConnected type:None path:null
> > [zk: localhost:2281(CONNECTED) 0] ls /
> > 2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN
> > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] - Client
> > session timed out, have not heard from server in 20004ms for sessionid
> > 0x10002239ae10000
> > 2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO
> > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] - Client
> > session timed out, have not heard from server in 20004ms for sessionid
> > 0x10002239ae10000, closing socket connection and attempting reconnect
> > 2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473] -
> > channel is disconnected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 !
> > R:localhost/127.0.0.1:2281]
> > 2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is told
> > closing KeeperErrorCode = ConnectionLoss for /
> > [zk: localhost:2281(CONNECTED) 1]
> >
> > I see some suspicious messages in these logs that I will need to
> > investigate further. But as a general observation, it looks like the
> > way we instantiate the Zookeeper server process for Pravega standalone
> > is not valid in zookeeper-3.5.5-rc6 (to inspect how we create the
> > Zookeeper server process, please see methods initialize() and start()
> > in this file<
> > https://github.com/pravega/pravega/blob/master/segmentstore/storage/im
> > pl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/ZooKe
> > eperServiceRunner.java
> > >).
> >
> > In summary, if the error I'm getting is related to changes in the SSL
> > configuration introduced in zookeeper-3.5.5, it would be great to get
> > feedback from you if I'm missing something. On the other hand, if the
> > way we are creating a Zookeeper server process is not the recommended
> > one, I'm also open to suggestions here.
> >
> > Thanks in advance and sorry for the long email, Raúl.
> >
> > PS: I have also tried to run the Zookeeper server process with SSL
> > forcing to only use the netty and boringSSL library versions that are
> > used either in Pravega(netty*:4.1.30.Final,
> > netty-tcnative-boringssl-static:2.0.17) or Zookeeper
> > 3.5.5(netty*:4.1.29.Final, netty-tcnative-boringssl-static:2.0.7), but
> > none of these combinations made any difference in the behavior of the
> Zookeeper server process.
> >
> > PS2: The JDK version I use is: openjdk version "1.8.0_212".
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message