zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6)
Date Fri, 17 May 2019 04:57:31 GMT
Il ven 17 mag 2019, 01:18 Gracia, Raul <Raul.Gracia@dell.com> ha scritto:

> Hi Andor,
>
> You are totally correct, the server works adding this auth provider.
> Thanks a lot!
>
> I did a cursory comparison between ZooKeeper versions 3.5.4-beta and 3.5.5
> and I couldn't find a change that justifies this behavior change.
> In any case, the Pravega build has passed with zookeeper-3.5.5, which are
> great news.
>
> I will execute some more tests and leave my vote to the release candidate,
> if you feel that this could be useful.
>

Raul,
It's great to see that you solved your problem.
It is also interesting that you are testing boring-ssl as we still not
included it in the release tarball.

Yes please cast your vote

Enrico



> Thanks a lot,
> Raúl.
>
> -----Original Message-----
> From: Andor Molnar <andor@cloudera.com.INVALID>
> Sent: Thursday, May 16, 2019 6:43 PM
> To: DevZooKeeper
> Subject: Re: Question about security configuration (was: Re: [VOTE] Apache
> ZooKeeper release 3.5.5 candidate 6)
>
>
> [EXTERNAL EMAIL]
>
> Hi Raul,
>
> X509AuthenticationProvider is not registered in the embedded ZK. In server
> logs it says:
> "[epollEventLoopGroup-4-1] ERROR
> org.apache.zookeeper.server.NettyServerCnxnFactory - Auth provider not
> found: x509"
>
> It's done by QuorumPeerConfig.java:436 (configureSSLAuth()) when you run
> ZooKeeper in standalone mode, but your code doesn't use this configuration
> class at all.
> If you add this:
>
> System.setProperty("zookeeper.authProvider.x509",
> "org.apache.zookeeper.server.auth.X509AuthenticationProvider");
>
> to your initialize() method, client SSL works:
>
> [nioEventLoopGroup-4-2] INFO
> org.apache.zookeeper.server.NettyServerCnxnFactory - SSL handler added for
> channel: [id: 0x698604a3, L:/127.0.0.1:2281 - R:/127.0.0.1:56750]
> [nioEventLoopGroup-4-2] INFO
> org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated
> Id 'CN=server.pravegastack.io' for Scheme 'x509'
>
> TBH I haven't diffed the code with 3.5.4-beta, so not sure why it worked
> previously and I don't have experience with embedded ZK, but I believe
> QuorumPeerConfig class has to be involved somehow.
>
> Regards,
> Andor
>
>
>
> On Thu, May 16, 2019 at 5:10 PM Gracia, Raul <Raul.Gracia@dell.com> wrote:
>
> > Thanks Andor for your quick reply. Let me answer to your questions:
> >
> > 1) Yes, the problem is related to client/server communication using
> > SSL, not related to Quorum SSL (we use a single Zookeeper process in our
> tests).
> > I would like your feedback first to conclude if this is a problem in
> > our config/code or a regression/change in the behavior of Zookeeper
> 3.5.5.
> >
> > 2) Yes, with the external Zookeeper server running separately (e.g.,
> > zkServer.sh start) all the tests are passing (SSL/non-SSL). With the
> > Zookeeper server process we instantiate in our tests, the non-SSL
> > tests are also passing, but not the SSL ones.
> >
> > 3) Correct. Just to give more detail here, we are instantiating the
> > Zookeeper server process using the ZooKeeperServer class jointly with
> > NettyServerCnxnFactory.
> >
> > 4) I have done 2 types of tests: with Zookeeper started as a separate
> > service ("zkServer.sh") and using the Zookeeper server process we
> > instantiate in Pravega standalone tests (namely, "zk-pravega-tests"):
> > - zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and
> > the Pravega standalone tests pass using it with/without SSL.
> > - zk-pravega-tests: Without SSL, the zkCli.sh can connect to that
> > process and the non-SSL Pravega tests pass. With SSL configured,
> > neither zkCli.sh nor Pravega tests with SSL are capable to connect to
> > the server (KeeperErrorCode = ConnectionLoss).
> >
> > 5) No, I haven't tested this scenario yet. I have tested a standalone
> > Zookeeper server (zkServer.sh) and a client (zkCli.sh) with SSL
> > enabled in the same machine, and it works well. Apart from that, I
> > have also performed distributed tests with a Zookeeper server
> > (3.5.4-beta) and Pravega (using Curator 4.0.1 + zookeeper-3.5.5) in
> Kubernetes and it worked fine.
> >
> > 6) Yes, in fact I have done a little more than that and I have created
> > a repository to investigate this issue in isolation:
> > https://github.com/RaulGracia/zookeeper-test
> > Apart from providing logs (see logs folder), in this repo I extracted
> > the piece of code from the Pravega repository that is used to start
> > the Zookeeper standalone process, making it easier to configure the
> > SSL properties via executable. I think that this will make it easier
> > for anyone to reproduce the problem I'm experiencing. Moreover, I have
> > provided instructions in the README file on how to reproduce the issue.
> >
> > Thanks a lot,
> > Raúl.
> >
> >
> > -----Original Message-----
> > From: Andor Molnar <andor@cloudera.com.INVALID>
> > Sent: Thursday, May 16, 2019 11:18 AM
> > To: DevZooKeeper
> > Subject: Re: Question about security configuration (was: Re: [VOTE]
> > Apache ZooKeeper release 3.5.5 candidate 6)
> >
> >
> > [EXTERNAL EMAIL]
> >
> > Hi Raul,
> >
> > Thanks for the analysis. Let me ask a few questions, because I see
> > some things that need to be clarified first.
> >
> > 1. This issue is only about server-client SSL scenario (not Quorum
> > TLS), so it's possibly a regression in 3.5. Is that correct?
> > 2. When running all Pravega tests against an external ZooKeeper
> > standalone server, all tests passed including SSL/nonSSL. Is that
> correct?
> > 3. SSL tests are failing when ZooKeeper is running inside the test
> process?
> > 4. You verified it by running ZooKeeper in standalone mode,
> > SSL-enabled and according to the log snippet, your client has
> > connected successfully, but later timed out. Is that right?
> > 5. Have you verified client-server SSL config with real (3-node)
> > cluster with zkCli.sh?
> > 6. Would you please provide the server side logs as well, maybe it
> > sheds some light why the client timed out?
> >
> > Thanks,
> > Andor
> >
> >
> >
> >
> > On Thu, May 16, 2019 at 10:25 AM Gracia, Raul <Raul.Gracia@dell.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > My name is Raúl Gracia and I work in the Pravega project
> > > (open-source project for data stream storage): http://pravega.io/.
> > >
> > > I'm currently working on a Pravega branch using
> > > "zookeeper-3.5.5-rc6", as we are interested on allowing Curator
> > > (4.0.1) to use a Zookeeper version with the bugfix proposed in
> > > ZOOKEEPER-2184<
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The
> > > integration has been pretty smooth and 99% of tests are successful
> > > in a Pravega build, and the original issue that motivated the
> > > upgrade to
> > > zookeeper-3.5.5 seems also solved.
> > >
> > > However, there are failures related to a specific type of tests in
> > > Pravega in which we instantiate a Zookeeper server process (for
> > > testing Pravega in standalone mode). Such failures only occur when
> > > running the standalone tests with SSL enabled, which includes
> > > configuring the Zookeeper server process with SSL as well.
> > >
> > > To constrain the scope of the problem, I have built
> > > zookeeper-3.5.5-rc6 ("mvn package") and executed the server (e.g.,
> > > "./bin/zkServer.sh start") with the appropriate security
> > > configuration
> > to enable SSL:
> > > export SERVER_JVMFLAGS="
> > >
> > > -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServe
> > > rC nxnFactory
> > > -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
> > > -Dzookeeper.ssl.keyStore.password=password
> > > -Dzookeeper.ssl.trustStore.location=.../client.truststore.jks
> > > -Dzookeeper.ssl.trustStore.password= password"
> > > (I have also added secureClientPort=2281 in zoo.cfg as indicated in
> > > the admin instructions)
> > >
> > > With the Zookeeper server running separately, I executed all the
> > > Pravega standalone tests (with and without SSL) pointing that
> > > external Zookeeper server (and disabling the Zookeeper server
> > > process that was created as part of the test workflow). Regarding
> > > configuration, in our tests the clients are configured with the
> > > recommended security settings in the administration
> > > guide:
> > > System.setProperty("zookeeper.client.secure", "true");
> > > System.setProperty("zookeeper.clientCnxnSocket",
> > > "org.apache.zookeeper.ClientCnxnSocketNetty");
> > > System.setProperty("zookeeper.ssl.trustStore.location",
> > > .../client.truststore.jks");
> > > System.setProperty("zookeeper.ssl.trustStore.password", "password
> > > "); System.setProperty("zookeeper.ssl.keyStore.location",
> > > ".../server.keystore.jks");
> > > System.setProperty("zookeeper.ssl.keyStore.password", "password ");
> > >
> > > In this case, all the Pravega standalone tests succeeded.
> > >
> > > This leaves us the way we are configuring SSL in the Zookeeper
> > > server process in Pravega standalone as the most plausible cause for
> > > the
> > problem.
> > > This is intriguing, as the security settings used are the same in
> > > both scenarios (zkServer.sh / Zookeeper server process started in
> > > the test
> > code).
> > >
> > > I have also confirmed this by running the Zookeeper server process
> > > used in standalone with/without SSL and connecting to it via the
> > > zkCli. Without SSL configured I can connect properly to it, whereas
> > > with SSL enabled I get the following error in the client:
> > >
> > > 2019-05-15 19:59:40,479 [myid:] - INFO  [main:ZooKeeper@868] -
> > > Initiating client connection, connectString=localhost:2281
> > > sessionTimeout=30000
> > > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto:
> > > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1>
> > > 2019-05-15 19:59:40,507 [myid:] - INFO  [main:X509Util@79] - Setting
> > > -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable
> > > client-initiated TLS renegotiation
> > > 2019-05-15 19:59:40,791 [myid:] - INFO  [main:ClientCnxnSocket@237]
> > > - jute.maxbuffer value is 4194304 Bytes
> > > 2019-05-15 19:59:40,798 [myid:] - INFO  [main:ClientCnxn@1653] -
> > > zookeeper.request.timeout value is 0. feature enabled=
> > > 2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO
> > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] -
> > > Opening socket connection to server localhost/127.0.0.1:2281. Will
> > > not attempt to authenticate using SASL (unknown error) Welcome to
> ZooKeeper!
> > > JLine support is enabled
> > > [zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168
> > > [myid:localhost:2281] - INFO
> > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFacto
> > > ry
> > > @460]
> > > - SSL handler added for channel: [id: 0x7bf11dfa]
> > > 2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO
> > > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket
> > > connection established, initiating session, client:
> > > /127.0.0.1:52652,
> > server:
> > > localhost/127.0.0.1:2281
> > > 2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO
> > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is
> > > connected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/
> > > 127.0.0.1:2281]
> > > 2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO
> > > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session
> > > establishment complete on server localhost/127.0.0.1:2281, sessionid
> > > = 0x10002239ae10000, negotiated timeout = 30000
> > > WATCHER::
> > > WatchedEvent state:SyncConnected type:None path:null
> > > [zk: localhost:2281(CONNECTED) 0] ls /
> > > 2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN
> > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] -
> > > Client session timed out, have not heard from server in 20004ms for
> > > sessionid
> > > 0x10002239ae10000
> > > 2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO
> > > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] -
> > > Client session timed out, have not heard from server in 20004ms for
> > > sessionid 0x10002239ae10000, closing socket connection and
> > > attempting reconnect
> > > 2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO
> > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473]
> > > - channel is disconnected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 !
> > > R:localhost/127.0.0.1:2281]
> > > 2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO
> > > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is
> > > told closing KeeperErrorCode = ConnectionLoss for /
> > > [zk: localhost:2281(CONNECTED) 1]
> > >
> > > I see some suspicious messages in these logs that I will need to
> > > investigate further. But as a general observation, it looks like the
> > > way we instantiate the Zookeeper server process for Pravega
> > > standalone is not valid in zookeeper-3.5.5-rc6 (to inspect how we
> > > create the Zookeeper server process, please see methods initialize()
> > > and start() in this file<
> > > https://github.com/pravega/pravega/blob/master/segmentstore/storage/
> > > im
> > > pl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/Zoo
> > > Ke
> > > eperServiceRunner.java
> > > >).
> > >
> > > In summary, if the error I'm getting is related to changes in the
> > > SSL configuration introduced in zookeeper-3.5.5, it would be great
> > > to get feedback from you if I'm missing something. On the other
> > > hand, if the way we are creating a Zookeeper server process is not
> > > the recommended one, I'm also open to suggestions here.
> > >
> > > Thanks in advance and sorry for the long email, Raúl.
> > >
> > > PS: I have also tried to run the Zookeeper server process with SSL
> > > forcing to only use the netty and boringSSL library versions that
> > > are used either in Pravega(netty*:4.1.30.Final,
> > > netty-tcnative-boringssl-static:2.0.17) or Zookeeper
> > > 3.5.5(netty*:4.1.29.Final, netty-tcnative-boringssl-static:2.0.7),
> > > but none of these combinations made any difference in the behavior
> > > of the
> > Zookeeper server process.
> > >
> > > PS2: The JDK version I use is: openjdk version "1.8.0_212".
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message