hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15487) ConcurrentModificationException resulting in Kerberos authentication error.
Date Mon, 04 Jun 2018 15:06:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500348#comment-16500348
] 

Daryn Sharp commented on HADOOP-15487:
--------------------------------------

The second exception is an unrelated jdk bug fixed in 8u161.  [JDK-8170278: ticket renewal
won't happen with debugging turned on|https://bugs.openjdk.java.net/browse/JDK-8170278]. 
The gssapi is smart recognizes and handles expired tickets from a keytab.  The problem is
{{KerberosTicket#toString}} throws the ISE if it's expired. Easy workaround is don't enable
debug logging.

The original issue is distinct.  If there truly are no custom plugins, it may be related to
curator/zookeeper/AuthenticatedURL.  What is the specific apache release?  Did the server
recover?

We may need to consider using a distinct subject/ugi for rpc servers to prevent other code
munging our JASS, but there are a few possible grues lurking there.



> ConcurrentModificationException resulting in Kerberos authentication error.
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-15487
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15487
>             Project: Hadoop Common
>          Issue Type: Bug
>         Environment: CDH 5.13.3. Kerberized, Hadoop-HA, jdk1.8.0_152
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>
> We found the following exception message in a NameNode log. It seems the ConcurrentModificationException
caused Kerberos authentication error.
> It appears to be a JDK bug, similar to HADOOP-13433 (Race in UGI.reloginFromKeytab) but
the version of Hadoop (CDH5.13.3) already patched HADOOP-13433. (The stacktrace also differs)
This cluster runs on JDK 1.8.0_152.
> {noformat}
> 2018-05-19 04:00:00,182 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs/node1@EXAMPLE.COM (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate
failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]
> 2018-05-19 04:00:00,183 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port
8020: readAndProcess from client 10.16.20.122 threw exception [java.util.ConcurrentModificationException]
> java.util.ConcurrentModificationException
>         at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
>         at java.util.LinkedList$ListItr.next(LinkedList.java:888)
>         at javax.security.auth.Subject$SecureSet$1.next(Subject.java:1070)
>         at javax.security.auth.Subject$ClassSet$1.run(Subject.java:1401)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1399)
>         at javax.security.auth.Subject$ClassSet.<init>(Subject.java:1372)
>         at javax.security.auth.Subject.getPrivateCredentials(Subject.java:767)
>         at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:127)
>         at sun.security.jgss.krb5.SubjectComber.findMany(SubjectComber.java:69)
>         at sun.security.jgss.krb5.ServiceCreds.getInstance(ServiceCreds.java:96)
>         at sun.security.jgss.krb5.Krb5Util.getServiceCreds(Krb5Util.java:203)
>         at sun.security.jgss.krb5.Krb5AcceptCredential$1.run(Krb5AcceptCredential.java:74)
>         at sun.security.jgss.krb5.Krb5AcceptCredential$1.run(Krb5AcceptCredential.java:72)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:71)
>         at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>         at sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>         at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>         at sun.security.jgss.GSSCredentialImpl.<init>(GSSCredentialImpl.java:62)
>         at sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>         at com.sun.security.sasl.gsskerb.GssKrb5Server.<init>(GssKrb5Server.java:108)
>         at com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>         at org.apache.hadoop.security.SaslRpcServer$FastSaslServerFactory.createSaslServer(SaslRpcServer.java:398)
>         at org.apache.hadoop.security.SaslRpcServer$1.run(SaslRpcServer.java:164)
>         at org.apache.hadoop.security.SaslRpcServer$1.run(SaslRpcServer.java:161)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
>         at org.apache.hadoop.security.SaslRpcServer.create(SaslRpcServer.java:160)
>         at org.apache.hadoop.ipc.Server$Connection.createSaslServer(Server.java:1742)
>         at org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1522)
>         at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1433)
>         at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1396)
>         at org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:2080)
>         at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1920)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1682)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:896)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:752)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:723)
> {noformat}
> We saw a few GSSException in the NN log, but only one threw the ConcurrentModificationException.
This NN had a failover, which is caused by ZKFC having GSSException too. Suspect it's related
issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message