Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 04524200C7D for ; Tue, 2 May 2017 05:57:18 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 02D07160BC2; Tue, 2 May 2017 03:57:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A2112160BC1 for ; Tue, 2 May 2017 05:57:16 +0200 (CEST) Received: (qmail 46838 invoked by uid 500); 2 May 2017 03:57:15 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 46829 invoked by uid 99); 2 May 2017 03:57:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 May 2017 03:57:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7733218FEBF for ; Tue, 2 May 2017 03:57:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id F5vBOgTKlp7x for ; Tue, 2 May 2017 03:57:10 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 730985F2FD for ; Tue, 2 May 2017 03:57:10 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id DFEA3E092E for ; Tue, 2 May 2017 03:57:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 626D821DE6 for ; Tue, 2 May 2017 03:57:04 +0000 (UTC) Date: Tue, 2 May 2017 03:57:04 +0000 (UTC) From: "Rui Li (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-15859) HoS: Write RPC messages in event loop MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 02 May 2017 03:57:18 -0000 [ https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15859: -------------------------- Fix Version/s: 1.3.0 > HoS: Write RPC messages in event loop > ------------------------------------- > > Key: HIVE-15859 > URL: https://issues.apache.org/jira/browse/HIVE-15859 > Project: Hive > Issue Type: Bug > Components: Hive, Spark > Affects Versions: 2.1.1 > Environment: hadoop2.7.1 > spark1.6.2 > hive2.2 > Reporter: KaiXu > Assignee: Rui Li > Fix For: 1.3.0, 2.2.0 > > Attachments: HIVE-15859.1.patch, HIVE-15859.2.patch, HIVE-15859.3.patch > > > Hive on Spark, failed with error: > {noformat} > 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1 > 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1 > 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1 > Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)' > FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask > {noformat} > application log shows the driver commanded a shutdown with some unknown reason, but hive's log shows Driver could not get RPC header( Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead). > {noformat} > 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml > 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in stage 3.0 (TID 2519) > 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown > 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared > 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped > 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml > 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected. > 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down. > 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in stage 3.0 (TID 2511) > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called > 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14 > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c > 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4 > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a > 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1 > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3 > 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in stage 3.0 (TID 2515) > 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 3.0 (TID 2417) > 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in stage 3.0 (TID 2526) > 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 3.0 (TID 2423) > {noformat} > in hive's log, > {noformat} > 2017-02-08T09:51:04,327 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.TaskSetManager: Finished task 971.0 in stage 3.0 (TID 2218) in 5948 ms on hsx-node8 (1338/1520) > 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext, org.apache.hive.spark.client.rpc.Rpc$MessageHeader)). > 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.RpcDispatcher: [DriverProtocol] Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead. > 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (null). > 2017-02-08T09:51:04,347 INFO [RPC-Handler-3] rpc.RpcDispatcher: [ClientProtocol] Closing channel due to exception in pipeline (Connection reset by peer). > 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception > 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed. > 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63) > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.Rpc: Failed to send RPC, closing connection. > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: java.nio.channels.ClosedChannelException > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN client.RemoteDriver: Shutting down driver because RPC channel was closed. > 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO client.RemoteDriver: Shutting down remote driver. > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed. > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149) > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259) > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270) > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490) > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63) > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.DAGScheduler: Asked to cancel job 2 > 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.InterruptedException > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Native Method) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Object.java:502) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$.ready(package.scala:86) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.FutureTask.run(FutureTask.java:266) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed. > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259) > 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) > 2017-02-08T09:51:04,654 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c > 2017-02-08T09:52:04,346 WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] impl.RemoteSparkJobStatus: Error getting stage info > java.util.concurrent.TimeoutException > at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] > at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60] > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60] > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60] > at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60] > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2] > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2] > 2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] status.SparkJobMonitor: Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)' > java.lang.IllegalStateException: RPC channel is closed. > at com.google.common.base.Preconditions.checkState(Preconditions.java:149) ~[guava-14.0.1.jar:?] > at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > {noformat} > also in container's log, I find Driver still request for executors: > {noformat} > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 77 executor(s). > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 76 executor(s). > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 75 executor(s). > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 74 executor(s). > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 73 executor(s). > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 71 executor(s). > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 70 executor(s). > 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers > 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 50 executor(s). > 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers > 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none. > 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s). > 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers > 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none. > 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 192.168.1.1:42777 > 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. hsx-node1:42777 > 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0 > 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED > 17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. > 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1486453422616_0150 > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called > {noformat} > found only one ERROR in yarn application log: > {noformat} > 17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 (TID 2168). 3294 bytes result sent to driver > 17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 (TID 1587). 3312 bytes result sent to driver > 17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 (TID 2136). 3294 bytes result sent to driver > 17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 (TID 2007). 3294 bytes result sent to driver > 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown > 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared > 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped > 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected. > 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down. > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3 > 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. > 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6 > 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. > 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. > {noformat} > this error occurs when several queries run at the same time with large data scale, in fact it would not occur when running the query separately, but it can frequently occur when running together again. -- This message was sent by Atlassian JIRA (v6.3.15#6346)