hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
Date Sun, 22 Jun 2014 20:56:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040245#comment-14040245
] 

Yongjun Zhang commented on HDFS-6588:
-------------------------------------

There are the following errors (see https://builds.apache.org/job/PreCommit-HDFS-Build/7198//testReport/):

{code} 
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[0]	0.6 sec	1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[1]	22 ms	1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[2]	18 ms	1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[3]	18 ms	1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[4]	15 ms	1
 org.apache.hadoop.hdfs.web.TestWebHdfsTokens.testLazyTokenFetchForSWebhdfs	5.5 sec	1
 org.apache.hadoop.hdfs.web.TestWebHdfsTokens.testLazyTokenFetchForWebhdfs
{code}

The errors belong to two categories:

1. TestWebHdfsTokens

It failed due to the following exception:
{code}
java.security.PrivilegedActionException: org.apache.hadoop.security.token.SecretManager$InvalidToken:
token (HDFS_DELEGATION_TOKEN token 5 for DummyUser) can't be found in cache.
{code}
thus causing a SecurityException thrown by UserProvider with cause InvalidToken, which is
the InvalidToken listed above. The reason is "token not found in cache" and it has no cause
assigned.

This means UserProvider could throw two types of exception relevant to our discussion:

T1. SecurityException with InvalidToken as cause, which has StandbyException as cause. Reported
in HDFS-6475.
T2. SecurityException with InvalidToken as cause, which has no cause assigned, and the failure
was because "token not found in cache".

I found that currently committeed code (including Daryn's fix HDFS-6222) appear to have handled
T2, and there actually seems to be no real problem with T2, except my earlier patch to HDFS-6475
"touched" it by returning the InvalidToken, which triggerd these test failures.  I am modifying
the patch of HDFS-6475 to just process T1 (and not T2) and submitting new version there. 

Hi [~daryn]  thanks a lot for your review and comments earlier at HDFS-6475. hope this change
will be fine with you. I will invite you to comment there when I have the new version uploaded.
Appreciate very much.

2. TestSaslRPC

{code}
expected:<[Token is invali]d> but was:<[DIGEST-MD5: IO error acquiring passwor]d>
Stacktrace

org.junit.ComparisonFailure: expected:<[Token is invali]d> but was:<[DIGEST-MD5:
IO error acquiring passwor]d>
	at org.junit.Assert.assertEquals(Assert.java:115)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage(TestSaslRPC.java:387)
{code}

This error happened because we no longer pass the exception through getTrueCause. 

The stack when the above error happened is listed below as a reference.  We can see that the
{{retriableRetrievePassword}} method is called, and this test designed it to return InvalidToken
because of class {{BadTokenSecretManager}} in the test code TestSaslRPC.java. 

Hi [~jingzhao], many thanks for your review and advice in HDFS-6475. The {{getTrueCause}}
method is part of the HDFS-5322 solution you worked out. Would you please comment whether
the above failed tests (failed because of  getTrueCause removal) reflect some real use scenarios?
If so, then the logic getgTrueCause need to be reflected in somewhere if this method is to
be removed. Thanks.

{code}
TestSaslRPC$BadTokenSecretManager.retrievePassword(TestSaslRPC$TestTokenIdentifier) line:
268	
TestSaslRPC$BadTokenSecretManager(TestSaslRPC$TestTokenSecretManager).retrievePassword(TokenIdentifier)
line: 1	
TestSaslRPC$BadTokenSecretManager(SecretManager<T>).retriableRetrievePassword(T) line:
91	
SaslRpcServer$SaslDigestCallbackHandler.getPassword(TokenIdentifier) line: 275	
SaslRpcServer$SaslDigestCallbackHandler.handle(Callback[]) line: 302	
DigestMD5Server.validateClientResponse(byte[][]) line: 585	
DigestMD5Server.evaluateResponse(byte[]) line: 244	
Server$Connection.processSaslToken(RpcHeaderProtos$RpcSaslProto) line: 1392	
Server$Connection.processSaslMessage(RpcHeaderProtos$RpcSaslProto) line: 1369	
Server$Connection.saslProcess(RpcHeaderProtos$RpcSaslProto) line: 1270	
Server$Connection.saslReadAndProcess(DataInputStream) line: 1244	
Server$Connection.processRpcOutOfBandRequest(RpcHeaderProtos$RpcRequestHeaderProto, DataInputStream)
line: 1932	
Server$Connection.processOneRpc(byte[]) line: 1805	
Server$Connection.readAndProcess() line: 1548	
Server$Listener.doRead(SelectionKey) line: 753	
Server$Listener$Reader.doRunLoop() line: 627	
Server$Listener$Reader.run() line: 598	
{code}

BTW, I have spent quite some time to analyze this, I do believe removal of {{getTrueCause}}
is a separate issue and deserve a new jira here. I hope you'd agree based on the information
provided above. If you still think it should be taken care of as part of HDFS-6475, please
let me know. I'm certainly open for discussion. Thanks a lot.




> Investigating removing getTrueCause method in Server.java
> ---------------------------------------------------------
>
>                 Key: HDFS-6588
>                 URL: https://issues.apache.org/jira/browse/HDFS-6588
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: security, webhdfs
>    Affects Versions: 2.5.0
>            Reporter: Yongjun Zhang
>
> When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
> {quote}
> What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement
to delete all but a few lines of the code changed in this patch seems a bit odd. I think you
just need to:
> - Delete getTrueCause entirely instead of moving it elsewhere
> - In saslProcess, just throw the exception instead of running it through getTrueCause
since it's not a "InvalidToken wrapping another exception" anymore.
> - Keep your 3-line change to unwrap SecurityException in toResponse
> {quote}
> There are multiple test failures, after making the suggested changes, Filing this jira
to dedicate to the investigation of removing getTrueCause method.
> More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message