falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat Ranganathan" <n....@live.com>
Subject Re: Review Request 40103: FALCON-1595 : In kerberos secured clusters, Falcon loses ability to communicate with HDFS overtime.
Date Tue, 01 Dec 2015 21:32:43 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40103/#review108562
-----------------------------------------------------------

Ship it!


Ship It!

- Venkat Ranganathan


On Nov. 9, 2015, 1 p.m., Balu Vellanki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40103/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2015, 1 p.m.)
> 
> 
> Review request for Falcon, Ajay Yadava, Sowmya Ramesh, and Venkat Ranganathan.
> 
> 
> Bugs: FALCON-1595
>     https://issues.apache.org/jira/browse/FALCON-1595
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> In a kerberos secured cluster where the Kerberos ticket validity is one day, Falcon server
eventually lost the ability to read and write to and from HDFS. In the logs we saw typical
Kerberos-related errors like "GSSException: No valid credentials provided (Mechanism level:
Failed to find any Kerberos tgt)". 
> 
> {code}
> 2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating user testUser
(HadoopClientFactory:197)
> 2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered while connecting
to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] (Client:680)
> 2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for instance sample-process:2015-10-28T03:58Z
after 420000 (AbstractRerunConsumer:84)
> java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level:
Failed to find any Kerberos tgt)]; Host Details : local host is: "sample.host.com/127.0.0.1";
destination host is: "sample.host.com":8020; 
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1431)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
> 	at org.apache.falcon.rerun.handler.LateRerunConsumer.detectLate(LateRerunConsumer.java:108)
> 	at org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:67)
> 	at org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:47)
> 	at org.apache.falcon.rerun.handler.AbstractRerunConsumer.run(AbstractRerunConsumer.java:73)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any
Kerberos tgt)]
> 	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1397)
> {code}
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/hadoop/HadoopClientFactory.java 9534ff2 
> 
> Diff: https://reviews.apache.org/r/40103/diff/
> 
> 
> Testing
> -------
> 
> end2end testing done on a two node secure cluster. Updated krb5.conf, ticket_lifetime
set to 1day, renew_lifetime set to 1day. Ran falcon for more than a two days and falcon did
not have issues accessing hdfs.
> 
> 
> Thanks,
> 
> Balu Vellanki
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message