hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-17837) Explicitly check if the HoS Remote Driver has been lost in the RemoteSparkJobMonitor
Date Thu, 19 Oct 2017 01:06:00 GMT
Sahil Takiar created HIVE-17837:
-----------------------------------

             Summary: Explicitly check if the HoS Remote Driver has been lost in the RemoteSparkJobMonitor

                 Key: HIVE-17837
                 URL: https://issues.apache.org/jira/browse/HIVE-17837
             Project: Hive
          Issue Type: Sub-task
          Components: Hive
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar


Right now the {{RemoteSparkJobMonitor}} implicitly checks if the connection to the Spark remote
driver is active. It does this everytime it triggers an invocation of the {{Rpc#call}} method
(so any call to {{SparkClient#run}}).

There are scenarios where we have seen the {{RemoteSparkJobMonitor}} when the connection to
the driver dies, because the implicit call fails to be invoked (see HIVE-15860).

It would be ideal if we made this call explicit, so we fail as soon as we know that the connection
to the driver has died.

The fix has the added benefit that it allows us to fail faster in the case where the {{RemoteSparkJobMonitor}}
is in the QUEUED / SENT state. If its stuck in that state, it won't fail until it hits the
monitor timeout (by default 1 minute), even though we already know the connection has died.
The error message that is thrown is also a little imprecise, it says there could be queue
contention, even though we know the real reason is that the connection was lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message