From issues-return-119845-archive-asf-public=cust-asf.ponee.io@hive.apache.org Wed May 23 13:18:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C7F3F180645 for ; Wed, 23 May 2018 13:18:04 +0200 (CEST) Received: (qmail 97356 invoked by uid 500); 23 May 2018 11:18:04 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 97346 invoked by uid 99); 23 May 2018 11:18:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2018 11:18:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 70EECC6AF6 for ; Wed, 23 May 2018 11:18:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id UZ33GiEl_iz1 for ; Wed, 23 May 2018 11:18:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 357775F126 for ; Wed, 23 May 2018 11:18:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 676A3E0100 for ; Wed, 23 May 2018 11:18:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 29FD021095 for ; Wed, 23 May 2018 11:18:00 +0000 (UTC) Date: Wed, 23 May 2018 11:18:00 +0000 (UTC) From: "Oleksiy Sayankin (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-19587) HeartBeat thread uses cancelled delegation token while connecting to meta on KERBEROS cluster MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin updated HIVE-19587: ------------------------------------ Attachment: HIVE-19587.1.patch > HeartBeat thread uses cancelled delegation token while connecting to meta on KERBEROS cluster > --------------------------------------------------------------------------------------------- > > Key: HIVE-19587 > URL: https://issues.apache.org/jira/browse/HIVE-19587 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Oleksiy Sayankin > Assignee: Oleksiy Sayankin > Priority: Blocker > Attachments: HIVE-19587.1.patch > > > *STEP 1. Create test data* > {code} > create table t1 (id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; > create table t2 (id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; > {code} > Generate 10 000 000 lines of random data > {code} > package com.test.app; > import java.io.FileNotFoundException; > import java.io.PrintWriter; > import java.util.concurrent.ThreadLocalRandom; > public class App { > public static void main(String[] args) throws FileNotFoundException { > try (PrintWriter out = new PrintWriter("table.data");) { > int min = 0; > int max = 10_000; > int numRows = 10_000_000; > for (int i = 0; i <= numRows - 1; i++){ > int randomNum = ThreadLocalRandom.current().nextInt(min, max + 1); > out.println(randomNum); > } > } > } > } > {code} > Upload data to Hive tables > {code} > load data local inpath '/home/myuser/table.data' into table t1; > load data local inpath '/home/myuser/table.data' into table t2; > {code} > *STEP 2. Configure transactions in hive-site.xml* > {code} > > > hive.exec.dynamic.partition.mode > nonstrict > > > hive.support.concurrency > true > > > hive.enforce.bucketing > true > > > hive.txn.manager > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager > > > hive.compactor.initiator.on > true > > > hive.compactor.worker.threads > 1 > > {code} > *STEP 3. Configure hive.txn.timeout in hive-site.xml* > {code} > > > hive.txn.timeout > 10s > > {code} > *STEP 4. Connect via beeline to HS2 with KERBEROS* > {code} > !connect jdbc:hive2://node8.cluster:10000/default;principal=myuser/node8.cluster@NODE8;ssl=true;sslTrustStore=/opt/myuser/conf/ssl_truststore > {code} > {code} > select count(*) from t1; > {code} > *STEP 5. Close connection and reconnect* > {code} > !close > {code} > {code} > !connect jdbc:hive2://node8.cluster:10000/default;principal=myuser/node8.cluster@NODE8;ssl=true;sslTrustStore=/opt/myuser/conf/ssl_truststore > {code} > *STEP 6. Perform long playing query* > This query lasts about 600s > {code} > select count(*) from t1 join t2 on t1.id = t2.id; > {code} > *EXPECTED RESULT* > Query finishes successfully > *ACTUAL RESULT* > {code} > 2018-05-17T13:54:54,921 ERROR [pool-7-thread-10] transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539) > at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) > at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:663) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:660) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:660) > at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: owner=myuser, renewer=myuser, realUser=, issueDate=1526565229297, maxDate=1527170029297, sequenceNumber=1, masterKeyId=1 > at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:104) > at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:472) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:503) > at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 15 more > 2018-05-17T13:54:54,966 ERROR [pool-7-thread-10] server.TThreadPoolServer: Error occurred during processing of message. > java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: DIGEST-MD5: IO error acquiring password > at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:663) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:660) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:660) > at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.thrift.transport.TTransportException: DIGEST-MD5: IO error acquiring password > at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) > at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316) > at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) > at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) > ... 10 more > {code} > *ROOT-CAUSE* > HS2 uses cancelled delegation token to connect to metastore. From logs > 1. Metastore creates delegation token (issueDate=1526565229297) > {code} > 2018-05-17T13:53:49,297 INFO [pool-7-thread-3] delegation.AbstractDelegationTokenSecretManager: Creating password for identifier: owner=myuser, renewer=myuser, realUser=, issueDate=1526565229297, m$ > 2018-05-17T13:53:50,789 INFO [Thread-18] txn.TxnHandler: 'Cleaner' locked by 'node8.cluster' > 2018-05-17T13:53:50,793 INFO [Thread-18] txn.TxnHandler: 'Cleaner' unlocked by 'node8.cluster' > 2018-05-17T13:53:54,840 INFO [pool-7-thread-4] metastore.HiveMetaStore: 3: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore > {code} > 2. Metastore cancels delegation token (issueDate=1526565229297) > {code} > 2018-05-17T13:54:32,380 INFO [pool-7-thread-4] thrift.TokenStoreDelegationTokenSecretManager: Token cancelation requested for identifier: owner=mapr, renewer=mapr, realUser=, issueDate=1526565229297, maxDate=1527170029297,$ > 2018-05-17T13:54:32,416 INFO [pool-7-thread-4] metastore.HiveMetaStore: 3: Cleaning up thread local RawStore... > {code} > 3. HS2 uses canceled delegation token and crashes (issueDate=1526565229297): > {code} > 2018-05-17T13:54:54,921 ERROR [pool-7-thread-10] transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539) > at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) > at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:663) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:660) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) > at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:660) > at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: owner=myuser, renewer=myuser, realUser=, issueDate=1526565229297, maxDate=1527170029297, sequenceNumber=1, masterKeyId=1 > at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:104) > {code} > *WORKAROUND* > From Hive sources: > {code} > heartBeatInterval = HIVE_TXN_TIMEOUT / 2. > {code} > So to skip HB during job one needs to set > {code} > HIVE_TXN_TIMEOUT = (max_job_duration * 2) * 1.25. > {code} > Here I added 1.25 factor to be sure there is no HB. In my case max_job_duration = ~600s, then > {code} > HIVE_TXN_TIMEOUT = 600s * 2 * 1.25 = 1500s > {code} > When I {{set hive.txn.timeout}} = 1500s issue was gone on my local env. > FYI [~isnogood], [~ashutoshc] -- This message was sent by Atlassian JIRA (v7.6.3#76005)