hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13433) Race in UGI.reloginFromKeytab
Date Wed, 11 Jan 2017 06:32:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817379#comment-15817379
] 

Hadoop QA commented on HADOOP-13433:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 17s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 20s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 13s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 29s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  2s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 18s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 25s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 47s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 36s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  6s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m  6s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 30s{color}
| {color:orange} hadoop-common-project/hadoop-common: The patch generated 3 new + 94 unchanged
- 2 fixed = 97 total (was 96) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 58s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 18s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  1s{color} | {color:green}
The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 29s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 48s{color} |
{color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  5s{color} | {color:red}
hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 32s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 59s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestKDiag |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13433 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846745/HADOOP-13433-v4.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  xml  findbugs
 checkstyle  |
| uname | Linux be7d50c756ba 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 467f5f1 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/11414/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
|
| unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/11414/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11414/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
|
| Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11414/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Race in UGI.reloginFromKeytab
> -----------------------------
>
>                 Key: HADOOP-13433
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13433
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6
>
>         Attachments: HADOOP-13433-v1.patch, HADOOP-13433-v2.patch, HADOOP-13433-v4.patch,
HADOOP-13433.patch, HBASE-13433-testcase-v3.patch
>
>
> This is a problem that has troubled us for several years. For our HBase cluster, sometimes
the RS will be stuck due to
> {noformat}
> 2016-06-20,03:44:12,936 INFO org.apache.hadoop.ipc.SecureClient: Exception encountered
while connecting to the server :
> javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid
credentials provided (Mechanism level: The ticket isn't for us (35) - BAD TGS SERVER NAME)]
>         at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
>         at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:140)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupSaslConnection(SecureClient.java:187)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.access$700(SecureClient.java:95)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:325)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:322)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
>         at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
>         at org.apache.hadoop.hbase.security.User.call(User.java:607)
>         at org.apache.hadoop.hbase.security.User.access$700(User.java:51)
>         at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:461)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:321)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1164)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1004)
>         at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:107)
>         at $Proxy24.replicateLogEntries(Unknown Source)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:962)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.runLoop(ReplicationSource.java:466)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:515)
> Caused by: GSSException: No valid credentials provided (Mechanism level: The ticket isn't
for us (35) - BAD TGS SERVER NAME)
>         at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)
>         at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
>         at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:180)
>         at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
>         ... 23 more
> Caused by: KrbException: The ticket isn't for us (35) - BAD TGS SERVER NAME
>         at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)
>         at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)
>         at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)
>         at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)
>         at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)
>         at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)
>         ... 26 more
> Caused by: KrbException: Identifier doesn't match expected value (906)
>         at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)
>         at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)
>         at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)
>         at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)
>         ... 31 more‚Äč
> {noformat}
> It rarely happens, but if it happens, the regionserver will be stuck and can never recover.
> Recently we added a log after a successful re-login which prints the private credentials,
and finally catched the direct reason. After a successful re-login, we have two kerberos tickets
in the credentials, one is the TGT, and the other is a service ticket. The strange thing is
that, the service ticket is placed before TGT. This breaks the assumption of jdk's kerberos
library. See http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5InitCredential.java,
the {{getTgt}} Method
> {code:title=Krb5InitCredential}
>             return AccessController.doPrivileged(
>                 new PrivilegedExceptionAction<KerberosTicket>() {
>                 public KerberosTicket run() throws Exception {
>                     // It's OK to use null as serverPrincipal. TGT is almost
>                     // the first ticket for a principal and we use list.
>                     return Krb5Util.getTicket(
>                         realCaller,
>                         clientPrincipal, null, acc);
>                         }});
> {code}
> So here, the library will use the service ticket as TGT to acquire a service ticket,
and KDC will reject the request since the 'TGT' does not start with 'krbtgt'. And it can never
recover because in UGI, the re-login will check if there is a valid TGT first and no doubt,
we have one...
> This usually happens when a secure connection initialization comes along with the re-login,
and the end time indicates that the service ticket is acquired by the previous TGT. Since
UGI does not prevent doAs and re-login happen at the same time, we believe that there is a
race condition.
> After reading the code, we found a possible race condition.
> See http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5Context.java,
the {{initSecContext}} method, we will get TGT first, then check if there is already a service
ticket, if not, acquire a service ticket using the TGT, and put it into the credentials.
> And in Krb5LoginModule.logout(the sun version), we will remove the kerberos tickets from
the credentials first, and then destroy them.
> Here comes the race condition. Let T1 be the secure connection set up thread, T2 be the
re-login thread.
> T1: get TGT
> T2: remove all tickets from credentials
> T1: check service ticket, none(since all tickets have been removed)
> T1: acquire a new service ticket using TGT and put it into the credentials
> T2: destroy all tickets
> T2: login, i.e., put a new TGT into the credentials.
> It is hard to write a UT to produce the problem because the racing code is in jdk, which
is not written by us...
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message