hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-17908) LLAP External client not correctly handling killTask for pending requests
Date Tue, 31 Oct 2017 22:44:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Dere updated HIVE-17908:
------------------------------
    Status: Patch Available  (was: Open)

> LLAP External client not correctly handling killTask for pending requests
> -------------------------------------------------------------------------
>
>                 Key: HIVE-17908
>                 URL: https://issues.apache.org/jira/browse/HIVE-17908
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-17908.1.patch, HIVE-17908.2.patch, HIVE-17908.3.patch
>
>
> Hitting "Timed out waiting for heartbeat for task ID" errors with the LLAP external client.
> HIVE-17393 fixed some of these errors, however it is also occurring because the client
is not correctly handling the killTask notification when the request is accepted but still
waiting for the first task heartbeat. In this situation the client should retry the request,
similar to what the LLAP AM does. Current logic is ignoring the killTask in this situation,
which results in a heartbeat timeout - no heartbeats are sent by LLAP because of the killTask
notification.
> {noformat}
> 17/08/09 05:36:02 WARN TaskSetManager: Lost task 10.0 in stage 4.0 (TID 14, cn114-10.l42scl.hortonworks.com,
executor 5): java.io.IOException: Received reader event error: Timed out waiting for heartbeat
for task ID attempt_7739111832518812959_0005_0_00_000010_0
>         at org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:178)
>         at org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:50)
>         at org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
>         at org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
>         at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266)
>         at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
>         at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>         at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
Source)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
>         at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:99)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: LlapTaskUmbilicalExternalClient(attempt_7739111832518812959_0005_0_00_000010_0):
Error while attempting to read chunk length
>         at org.apache.hadoop.hive.llap.io.ChunkedInputStream.read(ChunkedInputStream.java:82)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         at java.io.FilterInputStream.read(FilterInputStream.java:83)
>         at org.apache.hadoop.hive.llap.LlapBaseRecordReader.hasInput(LlapBaseRecordReader.java:267)
>         at org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:142)
>         ... 22 more
> Caused by: java.net.SocketException: Socket closed
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message