zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gracia, Raul" <Raul.Gra...@Dell.com>
Subject RE: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release 3.5.5 candidate 6)
Date Thu, 16 May 2019 23:17:33 GMT
Hi Andor,

You are totally correct, the server works adding this auth provider. Thanks a lot!

I did a cursory comparison between ZooKeeper versions 3.5.4-beta and 3.5.5 and I couldn't
find a change that justifies this behavior change. 
In any case, the Pravega build has passed with zookeeper-3.5.5, which are great news. 

I will execute some more tests and leave my vote to the release candidate, if you feel that
this could be useful.

Thanks a lot,
Raúl.

-----Original Message-----
From: Andor Molnar <andor@cloudera.com.INVALID> 
Sent: Thursday, May 16, 2019 6:43 PM
To: DevZooKeeper
Subject: Re: Question about security configuration (was: Re: [VOTE] Apache ZooKeeper release
3.5.5 candidate 6)


[EXTERNAL EMAIL] 

Hi Raul,

X509AuthenticationProvider is not registered in the embedded ZK. In server logs it says:
"[epollEventLoopGroup-4-1] ERROR
org.apache.zookeeper.server.NettyServerCnxnFactory - Auth provider not
found: x509"

It's done by QuorumPeerConfig.java:436 (configureSSLAuth()) when you run ZooKeeper in standalone
mode, but your code doesn't use this configuration class at all.
If you add this:

System.setProperty("zookeeper.authProvider.x509",
"org.apache.zookeeper.server.auth.X509AuthenticationProvider");

to your initialize() method, client SSL works:

[nioEventLoopGroup-4-2] INFO
org.apache.zookeeper.server.NettyServerCnxnFactory - SSL handler added for
channel: [id: 0x698604a3, L:/127.0.0.1:2281 - R:/127.0.0.1:56750] [nioEventLoopGroup-4-2]
INFO org.apache.zookeeper.server.auth.X509AuthenticationProvider - Authenticated Id 'CN=server.pravegastack.io'
for Scheme 'x509'

TBH I haven't diffed the code with 3.5.4-beta, so not sure why it worked previously and I
don't have experience with embedded ZK, but I believe QuorumPeerConfig class has to be involved
somehow.

Regards,
Andor



On Thu, May 16, 2019 at 5:10 PM Gracia, Raul <Raul.Gracia@dell.com> wrote:

> Thanks Andor for your quick reply. Let me answer to your questions:
>
> 1) Yes, the problem is related to client/server communication using 
> SSL, not related to Quorum SSL (we use a single Zookeeper process in our tests).
> I would like your feedback first to conclude if this is a problem in 
> our config/code or a regression/change in the behavior of Zookeeper 3.5.5.
>
> 2) Yes, with the external Zookeeper server running separately (e.g., 
> zkServer.sh start) all the tests are passing (SSL/non-SSL). With the 
> Zookeeper server process we instantiate in our tests, the non-SSL 
> tests are also passing, but not the SSL ones.
>
> 3) Correct. Just to give more detail here, we are instantiating the 
> Zookeeper server process using the ZooKeeperServer class jointly with 
> NettyServerCnxnFactory.
>
> 4) I have done 2 types of tests: with Zookeeper started as a separate 
> service ("zkServer.sh") and using the Zookeeper server process we 
> instantiate in Pravega standalone tests (namely, "zk-pravega-tests"):
> - zkServer.sh: Works well with regular Zookeeper client (zkCli.sh) and 
> the Pravega standalone tests pass using it with/without SSL.
> - zk-pravega-tests: Without SSL, the zkCli.sh can connect to that 
> process and the non-SSL Pravega tests pass. With SSL configured, 
> neither zkCli.sh nor Pravega tests with SSL are capable to connect to 
> the server (KeeperErrorCode = ConnectionLoss).
>
> 5) No, I haven't tested this scenario yet. I have tested a standalone 
> Zookeeper server (zkServer.sh) and a client (zkCli.sh) with SSL 
> enabled in the same machine, and it works well. Apart from that, I 
> have also performed distributed tests with a Zookeeper server 
> (3.5.4-beta) and Pravega (using Curator 4.0.1 + zookeeper-3.5.5) in Kubernetes and it
worked fine.
>
> 6) Yes, in fact I have done a little more than that and I have created 
> a repository to investigate this issue in isolation:
> https://github.com/RaulGracia/zookeeper-test
> Apart from providing logs (see logs folder), in this repo I extracted 
> the piece of code from the Pravega repository that is used to start 
> the Zookeeper standalone process, making it easier to configure the 
> SSL properties via executable. I think that this will make it easier 
> for anyone to reproduce the problem I'm experiencing. Moreover, I have 
> provided instructions in the README file on how to reproduce the issue.
>
> Thanks a lot,
> Raúl.
>
>
> -----Original Message-----
> From: Andor Molnar <andor@cloudera.com.INVALID>
> Sent: Thursday, May 16, 2019 11:18 AM
> To: DevZooKeeper
> Subject: Re: Question about security configuration (was: Re: [VOTE] 
> Apache ZooKeeper release 3.5.5 candidate 6)
>
>
> [EXTERNAL EMAIL]
>
> Hi Raul,
>
> Thanks for the analysis. Let me ask a few questions, because I see 
> some things that need to be clarified first.
>
> 1. This issue is only about server-client SSL scenario (not Quorum 
> TLS), so it's possibly a regression in 3.5. Is that correct?
> 2. When running all Pravega tests against an external ZooKeeper 
> standalone server, all tests passed including SSL/nonSSL. Is that correct?
> 3. SSL tests are failing when ZooKeeper is running inside the test process?
> 4. You verified it by running ZooKeeper in standalone mode, 
> SSL-enabled and according to the log snippet, your client has 
> connected successfully, but later timed out. Is that right?
> 5. Have you verified client-server SSL config with real (3-node) 
> cluster with zkCli.sh?
> 6. Would you please provide the server side logs as well, maybe it 
> sheds some light why the client timed out?
>
> Thanks,
> Andor
>
>
>
>
> On Thu, May 16, 2019 at 10:25 AM Gracia, Raul <Raul.Gracia@dell.com>
> wrote:
>
> > Hi all,
> >
> > My name is Raúl Gracia and I work in the Pravega project 
> > (open-source project for data stream storage): http://pravega.io/.
> >
> > I'm currently working on a Pravega branch using 
> > "zookeeper-3.5.5-rc6", as we are interested on allowing Curator 
> > (4.0.1) to use a Zookeeper version with the bugfix proposed in 
> > ZOOKEEPER-2184< 
> > https://issues.apache.org/jira/browse/ZOOKEEPER-2184>. The 
> > integration has been pretty smooth and 99% of tests are successful 
> > in a Pravega build, and the original issue that motivated the 
> > upgrade to
> > zookeeper-3.5.5 seems also solved.
> >
> > However, there are failures related to a specific type of tests in 
> > Pravega in which we instantiate a Zookeeper server process (for 
> > testing Pravega in standalone mode). Such failures only occur when 
> > running the standalone tests with SSL enabled, which includes 
> > configuring the Zookeeper server process with SSL as well.
> >
> > To constrain the scope of the problem, I have built
> > zookeeper-3.5.5-rc6 ("mvn package") and executed the server (e.g., 
> > "./bin/zkServer.sh start") with the appropriate security 
> > configuration
> to enable SSL:
> > export SERVER_JVMFLAGS="
> >
> > -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServe
> > rC nxnFactory 
> > -Dzookeeper.ssl.keyStore.location=.../server.keystore.jks
> > -Dzookeeper.ssl.keyStore.password=password
> > -Dzookeeper.ssl.trustStore.location=.../client.truststore.jks
> > -Dzookeeper.ssl.trustStore.password= password"
> > (I have also added secureClientPort=2281 in zoo.cfg as indicated in 
> > the admin instructions)
> >
> > With the Zookeeper server running separately, I executed all the 
> > Pravega standalone tests (with and without SSL) pointing that 
> > external Zookeeper server (and disabling the Zookeeper server 
> > process that was created as part of the test workflow). Regarding 
> > configuration, in our tests the clients are configured with the 
> > recommended security settings in the administration
> > guide:
> > System.setProperty("zookeeper.client.secure", "true"); 
> > System.setProperty("zookeeper.clientCnxnSocket",
> > "org.apache.zookeeper.ClientCnxnSocketNetty");
> > System.setProperty("zookeeper.ssl.trustStore.location",
> > .../client.truststore.jks");
> > System.setProperty("zookeeper.ssl.trustStore.password", "password 
> > "); System.setProperty("zookeeper.ssl.keyStore.location",
> > ".../server.keystore.jks");
> > System.setProperty("zookeeper.ssl.keyStore.password", "password ");
> >
> > In this case, all the Pravega standalone tests succeeded.
> >
> > This leaves us the way we are configuring SSL in the Zookeeper 
> > server process in Pravega standalone as the most plausible cause for 
> > the
> problem.
> > This is intriguing, as the security settings used are the same in 
> > both scenarios (zkServer.sh / Zookeeper server process started in 
> > the test
> code).
> >
> > I have also confirmed this by running the Zookeeper server process 
> > used in standalone with/without SSL and connecting to it via the 
> > zkCli. Without SSL configured I can connect properly to it, whereas 
> > with SSL enabled I get the following error in the client:
> >
> > 2019-05-15 19:59:40,479 [myid:] - INFO  [main:ZooKeeper@868] - 
> > Initiating client connection, connectString=localhost:2281
> > sessionTimeout=30000
> > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1<mailto:
> > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@621be5d1>
> > 2019-05-15 19:59:40,507 [myid:] - INFO  [main:X509Util@79] - Setting 
> > -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable 
> > client-initiated TLS renegotiation
> > 2019-05-15 19:59:40,791 [myid:] - INFO  [main:ClientCnxnSocket@237] 
> > - jute.maxbuffer value is 4194304 Bytes
> > 2019-05-15 19:59:40,798 [myid:] - INFO  [main:ClientCnxn@1653] - 
> > zookeeper.request.timeout value is 0. feature enabled=
> > 2019-05-15 19:59:40,817 [myid:localhost:2281] - INFO 
> > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1112] - 
> > Opening socket connection to server localhost/127.0.0.1:2281. Will 
> > not attempt to authenticate using SASL (unknown error) Welcome to ZooKeeper!
> > JLine support is enabled
> > [zk: localhost:2281(CONNECTING) 0] 2019-05-15 19:59:41,168 
> > [myid:localhost:2281] - INFO 
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientPipelineFacto
> > ry
> > @460]
> > - SSL handler added for channel: [id: 0x7bf11dfa]
> > 2019-05-15 19:59:41,176 [myid:localhost:2281] - INFO 
> > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@959] - Socket 
> > connection established, initiating session, client: 
> > /127.0.0.1:52652,
> server:
> > localhost/127.0.0.1:2281
> > 2019-05-15 19:59:41,178 [myid:localhost:2281] - INFO 
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$1@188] - channel is
> > connected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 - R:localhost/ 
> > 127.0.0.1:2281]
> > 2019-05-15 19:59:41,614 [myid:localhost:2281] - INFO 
> > [epollEventLoopGroup-2-1:ClientCnxn$SendThread@1394] - Session 
> > establishment complete on server localhost/127.0.0.1:2281, sessionid 
> > = 0x10002239ae10000, negotiated timeout = 30000
> > WATCHER::
> > WatchedEvent state:SyncConnected type:None path:null
> > [zk: localhost:2281(CONNECTED) 0] ls /
> > 2019-05-15 20:00:01,616 [myid:localhost:2281] - WARN 
> > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1190] - 
> > Client session timed out, have not heard from server in 20004ms for 
> > sessionid
> > 0x10002239ae10000
> > 2019-05-15 20:00:01,618 [myid:localhost:2281] - INFO 
> > [main-SendThread(localhost:2281):ClientCnxn$SendThread@1238] - 
> > Client session timed out, have not heard from server in 20004ms for 
> > sessionid 0x10002239ae10000, closing socket connection and 
> > attempting reconnect
> > 2019-05-15 20:00:01,630 [myid:localhost:2281] - INFO 
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@473] 
> > - channel is disconnected: [id: 0x7bf11dfa, L:/127.0.0.1:52652 !
> > R:localhost/127.0.0.1:2281]
> > 2019-05-15 20:00:01,631 [myid:localhost:2281] - INFO 
> > [epollEventLoopGroup-2-1:ClientCnxnSocketNetty@253] - channel is 
> > told closing KeeperErrorCode = ConnectionLoss for /
> > [zk: localhost:2281(CONNECTED) 1]
> >
> > I see some suspicious messages in these logs that I will need to 
> > investigate further. But as a general observation, it looks like the 
> > way we instantiate the Zookeeper server process for Pravega 
> > standalone is not valid in zookeeper-3.5.5-rc6 (to inspect how we 
> > create the Zookeeper server process, please see methods initialize() 
> > and start() in this file< 
> > https://github.com/pravega/pravega/blob/master/segmentstore/storage/
> > im 
> > pl/src/main/java/io/pravega/segmentstore/storage/impl/bookkeeper/Zoo
> > Ke
> > eperServiceRunner.java
> > >).
> >
> > In summary, if the error I'm getting is related to changes in the 
> > SSL configuration introduced in zookeeper-3.5.5, it would be great 
> > to get feedback from you if I'm missing something. On the other 
> > hand, if the way we are creating a Zookeeper server process is not 
> > the recommended one, I'm also open to suggestions here.
> >
> > Thanks in advance and sorry for the long email, Raúl.
> >
> > PS: I have also tried to run the Zookeeper server process with SSL 
> > forcing to only use the netty and boringSSL library versions that 
> > are used either in Pravega(netty*:4.1.30.Final,
> > netty-tcnative-boringssl-static:2.0.17) or Zookeeper 
> > 3.5.5(netty*:4.1.29.Final, netty-tcnative-boringssl-static:2.0.7), 
> > but none of these combinations made any difference in the behavior 
> > of the
> Zookeeper server process.
> >
> > PS2: The JDK version I use is: openjdk version "1.8.0_212".
> >
> >
>
Mime
View raw message