hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
Date Fri, 10 Feb 2017 10:07:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861036#comment-15861036
] 

Rui Li commented on HIVE-15860:
-------------------------------

A more specific way to fix it is just add the check when job has started and {{sparkJobStatus.getState()}}
returns null. The SENT and QUEUED branches are covered by the monitor timeout. The SUCCEEDED
and FAILED branch will break the loop themselves. So we only need to worry about the STARTED
branch.

> RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
> -----------------------------------------------------------------
>
>                 Key: HIVE-15860
>                 URL: https://issues.apache.org/jira/browse/HIVE-15860
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-15860.1.patch
>
>
> It happens when RemoteDriver crashes between {{JobStarted}} and {{JobSubmitted}}, e.g.
killed by {{kill -9}}. RemoteSparkJobMonitor will consider the job has started, however it
can't get the job info because it hasn't received the JobId. Then the monitor will loop forever.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message