hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
Date Fri, 27 May 2016 07:02:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303670#comment-15303670
] 

Prasanth Jayachandran commented on HIVE-13858:
----------------------------------------------

Looks like RB went down when I was commenting. Adding the comments here
1) CancellationException seems to be not caught. Is it not expected?
2) I think we can remove the TODO for throwing HiveException and replace it with InterruptedException.
IIRC throwing InterruptedException will also clear the interrupt status flag, so the Thread.interrupted()
call is also not required. TezProcessor anyways catches Throwable, so it should be safe to
throw InterruptedException.

> LLAP: A preempted task can end up waiting on completeInitialization if some part of the
executing code suppressed the interrupt
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13858
>                 URL: https://issues.apache.org/jira/browse/HIVE-13858
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>              Labels: llap
>         Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to preempt
a task.
> In this specific case, the task was in the middle of HDFS IO - which 'handled' the interrupt
by retrying. As a result the interrupt status on the thread was reset - so instead of skipping
the future.get in completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on what else
is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message