zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2712) MiniKdc test case intermittently failing due to principal not found in Kerberos database
Date Sun, 19 Mar 2017 05:00:44 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931573#comment-15931573
] 

Rakesh R commented on ZOOKEEPER-2712:
-------------------------------------

Thanks a lot [~hanm] for the unit test results and comments.

bq. I am curious what exactly cause the races though - from the description it was not very
clear to me. Do you mind to elaborate a little bit with regards to what code in test case
that uses what function in the dependency library and what the race condition is?

I hope the following explanation will help to understand the concurrency flow.

Below is the auth failed exception which frequently hits in our {{KerberosSecurityTestcase}}
related unit test cases.
{code}
2017-03-17 15:55:51,397 [myid:] - WARN  [NioProcessor-3:KerberosProtocolHandler@241] - Server
not found in Kerberos database (7)
2017-03-17 15:55:51,398 [myid:] - WARN  [NioProcessor-3:KerberosProtocolHandler@242] - Server
not found in Kerberos database (7)
		[Krb5LoginModule] authentication failed 
Server not found in Kerberos database (7) - Server not found in Kerberos database
2017-03-17 15:55:51,409 [myid:1] - ERROR [Thread-3:QuorumPeerTestBase$MainThread@145] - unexpected
exception in run
javax.security.sasl.SaslException: Failed to initialize authentication mechanism using SASL
[Caused by javax.security.auth.login.LoginException: Server not found in Kerberos database
(7) - Server not found in Kerberos database]
	at org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.<init>(SaslQuorumAuthServer.java:69)
	at org.apache.zookeeper.server.quorum.QuorumPeer.initialize(QuorumPeer.java:570)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:162)
{code}

As we know, test case is creating a ZK cluster of size 3 and uses [QuorumAuthTestBase.startServer()
#L186|https://github.com/apache/zookeeper/blob/branch-3.4/src/java/test/org/apache/zookeeper/server/quorum/auth/QuorumAuthTestBase.java#L186]
function to start server in a separate thread. Now, we have three servers starting parallel
in three different threads. During startup, each server will initialize SaslQuorumAuthServer
and SaslQuorumAuthLearner [QuorumPeer#init|https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L570]
and does auth login. For Krb login, it internally uses ApacheDS library as this test case
is based on {{KerberosSecurityTestcase}}. I have experimented a test scenario of doing multiple
Krb {{javax.security.auth.login.LoginContext#login()}} simultaneously and hits exactly the
same error {{server not found in Kerberos database}}. Later, I made the login in a sequential
fashion and never hits server not found problem. I personally feel, that ApacheDS login module
is sharing some resources and resulting in concurrency failure. IMHO, fixing ApacheDS is not
our scope and the sequential login changes makes the test case more consistent, does this
make sense to you?

> MiniKdc test case intermittently failing due to principal not found in Kerberos database
> ----------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2712
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2712
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.10
>
>         Attachments: TEST-org.apache.zookeeper.server.quorum.auth.QuorumKerberosAuthTest.txt
>
>
> MiniKdc test cases are intermittently failing due to not finding the principal. Below
is the failure stacktrace.
> {code}
> 2017-03-08 13:21:10,843 [myid:] - ERROR [NioProcessor-1:AuthenticationService@187] -
Error while searching for client learner@EXAMPLE.COM : Client not found in Kerberos database
> 2017-03-08 13:21:10,843 [myid:] - WARN  [NioProcessor-2:KerberosProtocolHandler@241]
- Server not found in Kerberos database (7)
> 2017-03-08 13:21:10,845 [myid:] - WARN  [NioProcessor-2:KerberosProtocolHandler@242]
- Server not found in Kerberos database (7)
> 2017-03-08 13:21:10,844 [myid:] - WARN  [NioProcessor-1:KerberosProtocolHandler@241]
- Client not found in Kerberos database (6)
> 2017-03-08 13:21:10,845 [myid:] - WARN  [NioProcessor-1:KerberosProtocolHandler@242]
- Client not found in Kerberos database (6)
> {code}
> Will attach the detailed log to jira.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message