hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13433) Race in UGI.reloginFromKeytab
Date Wed, 08 Feb 2017 12:34:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857930#comment-15857930
] 

Hadoop QA commented on HADOOP-13433:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 30s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 21s{color}
| {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 43s{color} |
{color:green} branch-2.7 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 38s{color} |
{color:green} branch-2.7 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 25s{color}
| {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 51s{color} |
{color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 18s{color}
| {color:green} branch-2.7 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 45s{color} | {color:red}
hadoop-common-project/hadoop-common in branch-2.7 has 3 extant Findbugs warnings. {color}
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 48s{color} |
{color:green} branch-2.7 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 54s{color} |
{color:green} branch-2.7 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 41s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  9s{color} |
{color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  9s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 15s{color} |
{color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 15s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 22s{color}
| {color:green} hadoop-common-project/hadoop-common: The patch generated 0 new + 80 unchanged
- 4 fixed = 80 total (was 84) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 46s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 14s{color}
| {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red}
The patch has 5748 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>.
Refer https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  2m 23s{color} | {color:red}
The patch 98 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 45s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 43s{color} |
{color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 45s{color} |
{color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m  5s{color} | {color:red}
hadoop-common in the patch failed with JDK v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 35s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 24s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_121 Failed junit tests | hadoop.fs.TestS3_LocalFileContextURI |
|   | hadoop.util.bloom.TestBloomFilters |
|   | hadoop.io.compress.TestCodecPool |
|   | hadoop.fs.TestLocal_S3FileContextURI |
|   | hadoop.ipc.TestIPC |
| JDK v1.8.0_121 Timed out junit tests | org.apache.hadoop.conf.TestConfiguration |
| JDK v1.7.0_121 Failed junit tests | hadoop.security.ssl.TestSSLFactory |
|   | hadoop.fs.TestS3_LocalFileContextURI |
|   | hadoop.util.bloom.TestBloomFilters |
|   | hadoop.fs.TestLocal_S3FileContextURI |
|   | hadoop.ipc.TestIPC |
| JDK v1.7.0_121 Timed out junit tests | org.apache.hadoop.conf.TestConfiguration |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:c420dfe |
| JIRA Issue | HADOOP-13433 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12851608/HADOOP-13433-branch-2.7-v2.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux 98a82c31d0ae 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.7 / bc3e188 |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
|
| findbugs | v3.0.0 |
| findbugs | https://builds.apache.org/job/PreCommit-HADOOP-Build/11597/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
|
| whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/11597/artifact/patchprocess/whitespace-eol.txt
|
| whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/11597/artifact/patchprocess/whitespace-tabs.txt
|
| unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/11597/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_121.txt
|
| JDK v1.7.0_121  Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11597/testReport/
|
| modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
|
| Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11597/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Race in UGI.reloginFromKeytab
> -----------------------------
>
>                 Key: HADOOP-13433
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13433
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.9.0, 3.0.0-alpha3
>
>         Attachments: HADOOP-13433-branch-2.7.patch, HADOOP-13433-branch-2.7-v1.patch,
HADOOP-13433-branch-2.7-v2.patch, HADOOP-13433-branch-2.8.patch, HADOOP-13433-branch-2.8.patch,
HADOOP-13433-branch-2.8-v1.patch, HADOOP-13433-branch-2.patch, HADOOP-13433.patch, HADOOP-13433-v1.patch,
HADOOP-13433-v2.patch, HADOOP-13433-v4.patch, HADOOP-13433-v5.patch, HADOOP-13433-v6.patch,
HBASE-13433-testcase-v3.patch
>
>
> This is a problem that has troubled us for several years. For our HBase cluster, sometimes
the RS will be stuck due to
> {noformat}
> 2016-06-20,03:44:12,936 INFO org.apache.hadoop.ipc.SecureClient: Exception encountered
while connecting to the server :
> javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid
credentials provided (Mechanism level: The ticket isn't for us (35) - BAD TGS SERVER NAME)]
>         at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
>         at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:140)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupSaslConnection(SecureClient.java:187)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.access$700(SecureClient.java:95)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:325)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:322)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
>         at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
>         at org.apache.hadoop.hbase.security.User.call(User.java:607)
>         at org.apache.hadoop.hbase.security.User.access$700(User.java:51)
>         at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:461)
>         at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:321)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1164)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1004)
>         at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:107)
>         at $Proxy24.replicateLogEntries(Unknown Source)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:962)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.runLoop(ReplicationSource.java:466)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:515)
> Caused by: GSSException: No valid credentials provided (Mechanism level: The ticket isn't
for us (35) - BAD TGS SERVER NAME)
>         at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)
>         at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
>         at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:180)
>         at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
>         ... 23 more
> Caused by: KrbException: The ticket isn't for us (35) - BAD TGS SERVER NAME
>         at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)
>         at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)
>         at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)
>         at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)
>         at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)
>         at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)
>         ... 26 more
> Caused by: KrbException: Identifier doesn't match expected value (906)
>         at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)
>         at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)
>         at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)
>         at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)
>         ... 31 more‚Äč
> {noformat}
> It rarely happens, but if it happens, the regionserver will be stuck and can never recover.
> Recently we added a log after a successful re-login which prints the private credentials,
and finally catched the direct reason. After a successful re-login, we have two kerberos tickets
in the credentials, one is the TGT, and the other is a service ticket. The strange thing is
that, the service ticket is placed before TGT. This breaks the assumption of jdk's kerberos
library. See http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5InitCredential.java,
the {{getTgt}} Method
> {code:title=Krb5InitCredential}
>             return AccessController.doPrivileged(
>                 new PrivilegedExceptionAction<KerberosTicket>() {
>                 public KerberosTicket run() throws Exception {
>                     // It's OK to use null as serverPrincipal. TGT is almost
>                     // the first ticket for a principal and we use list.
>                     return Krb5Util.getTicket(
>                         realCaller,
>                         clientPrincipal, null, acc);
>                         }});
> {code}
> So here, the library will use the service ticket as TGT to acquire a service ticket,
and KDC will reject the request since the 'TGT' does not start with 'krbtgt'. And it can never
recover because in UGI, the re-login will check if there is a valid TGT first and no doubt,
we have one...
> This usually happens when a secure connection initialization comes along with the re-login,
and the end time indicates that the service ticket is acquired by the previous TGT. Since
UGI does not prevent doAs and re-login happen at the same time, we believe that there is a
race condition.
> After reading the code, we found a possible race condition.
> See http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5Context.java,
the {{initSecContext}} method, we will get TGT first, then check if there is already a service
ticket, if not, acquire a service ticket using the TGT, and put it into the credentials.
> And in Krb5LoginModule.logout(the sun version), we will remove the kerberos tickets from
the credentials first, and then destroy them.
> Here comes the race condition. Let T1 be the secure connection set up thread, T2 be the
re-login thread.
> T1: get TGT
> T2: remove all tickets from credentials
> T1: check service ticket, none(since all tickets have been removed)
> T1: acquire a new service ticket using TGT and put it into the credentials
> T2: destroy all tickets
> T2: login, i.e., put a new TGT into the credentials.
> It is hard to write a UT to produce the problem because the racing code is in jdk, which
is not written by us...
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message