Return-Path: X-Original-To: apmail-falcon-dev-archive@minotaur.apache.org Delivered-To: apmail-falcon-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C872218D58 for ; Mon, 9 Nov 2015 21:14:13 +0000 (UTC) Received: (qmail 6286 invoked by uid 500); 9 Nov 2015 21:14:13 -0000 Delivered-To: apmail-falcon-dev-archive@falcon.apache.org Received: (qmail 6233 invoked by uid 500); 9 Nov 2015 21:14:13 -0000 Mailing-List: contact dev-help@falcon.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@falcon.apache.org Delivered-To: mailing list dev@falcon.apache.org Received: (qmail 6222 invoked by uid 99); 9 Nov 2015 21:14:13 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Nov 2015 21:14:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 2348F180440 for ; Mon, 9 Nov 2015 21:14:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.771 X-Spam-Level: * X-Spam-Status: No, score=1.771 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id kL3UJYu9Edo1 for ; Mon, 9 Nov 2015 21:14:11 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id 7A0E544177 for ; Mon, 9 Nov 2015 21:14:11 +0000 (UTC) Received: (qmail 6061 invoked by uid 99); 9 Nov 2015 21:14:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Nov 2015 21:14:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 02FB32C1F57 for ; Mon, 9 Nov 2015 21:14:11 +0000 (UTC) Date: Mon, 9 Nov 2015 21:14:11 +0000 (UTC) From: "Sowmya Ramesh (JIRA)" To: dev@falcon.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FALCON-1595) Falcon server loses ability to communicate with HDFS over time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FALCON-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997368#comment-14997368 ] Sowmya Ramesh commented on FALCON-1595: --------------------------------------- [~bvellanki]: What is the root cause for this issue? Why doesn't relogin done in AuthenticationInitializationService handle this case ? I am trying to understand if its one off case where token is just expiring and we try to dole out FS just before relogin. In that case similar to checkTGTAndReloginFromKeytab shouldn't we relogin if its close to expiry and not wait till its expired which is the current implementation. > Falcon server loses ability to communicate with HDFS over time > -------------------------------------------------------------- > > Key: FALCON-1595 > URL: https://issues.apache.org/jira/browse/FALCON-1595 > Project: Falcon > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Balu Vellanki > Assignee: Balu Vellanki > Attachments: FALCON-1595.patch > > > In a kerberos secured cluster where the Kerberos ticket validity is one day, Falcon server eventually lost the ability to read and write to and from HDFS. In the logs we saw typical Kerberos-related errors like "GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)". > {code} > 2015-10-28 00:04:59,517 INFO - [LaterunHandler:] ~ Creating FS impersonating user testUser (HadoopClientFactory:197) > 2015-10-28 00:04:59,519 WARN - [LaterunHandler:] ~ Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] (Client:680) > 2015-10-28 00:04:59,520 WARN - [LaterunHandler:] ~ Late Re-run failed for instance sample-process:2015-10-28T03:58Z after 420000 (AbstractRerunConsumer:84) > java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "sample.host.com/127.0.0.1"; destination host is: "sample.host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1431) > at org.apache.hadoop.ipc.Client.call(Client.java:1358) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) > at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) > at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) > at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) > at org.apache.falcon.rerun.handler.LateRerunConsumer.detectLate(LateRerunConsumer.java:108) > at org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:67) > at org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:47) > at org.apache.falcon.rerun.handler.AbstractRerunConsumer.run(AbstractRerunConsumer.java:73) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493) > at org.apache.hadoop.ipc.Client.call(Client.java:1397) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)