falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balu Vellanki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1595) Falcon server loses ability to communicate with HDFS over time
Date Mon, 09 Nov 2015 21:05:11 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997352#comment-14997352
] 

Balu Vellanki commented on FALCON-1595:
---------------------------------------

[~sowmyaramesh] : AuthenticationInitializationService logs in Falcon user periodically in
a secure cluster. In HadoopClientFactory, when creating FileSystem for the authenticated UGI
- falcon has to verify that the TGT is valid and relogin user if the TGT is not valid. 

> Falcon server loses ability to communicate with HDFS over time
> --------------------------------------------------------------
>
>                 Key: FALCON-1595
>                 URL: https://issues.apache.org/jira/browse/FALCON-1595
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>         Attachments: FALCON-1595.patch
>
>
> In a kerberos secured cluster where the Kerberos ticket validity is one day, Falcon server
eventually lost the ability to read and write to and from HDFS. In the logs we saw typical
Kerberos-related errors like "GSSException: No valid credentials provided (Mechanism level:
Failed to find any Kerberos tgt)". 
> {code}
> 2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating user testUser
(HadoopClientFactory:197)
> 2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered while connecting
to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] (Client:680)
> 2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for instance sample-process:2015-10-28T03:58Z
after 420000 (AbstractRerunConsumer:84)
> java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level:
Failed to find any Kerberos tgt)]; Host Details : local host is: "sample.host.com/127.0.0.1";
destination host is: "sample.host.com":8020; 
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1431)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
> 	at org.apache.falcon.rerun.handler.LateRerunConsumer.detectLate(LateRerunConsumer.java:108)
> 	at org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:67)
> 	at org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:47)
> 	at org.apache.falcon.rerun.handler.AbstractRerunConsumer.run(AbstractRerunConsumer.java:73)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any
Kerberos tgt)]
> 	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1397)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message