hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <>
Subject [jira] [Updated] (HIVE-8972) Implement more fine-grained remote client-level events [Spark Branch]
Date Wed, 17 Dec 2014 07:57:13 GMT


Rui Li updated HIVE-8972:
    Attachment: HIVE-8972.4-spark.patch

The latest patch only consists of minor fix and clean up.
I talked about this with [~chengxiang li]. Here's our thought on this task:
Currently we set timeout for {{JobSubmitted}} event and assume the job is always submitted
via async API and will send back its spark job ID (i.e. by calling monitorJob). If we add,
say, {{JobStarted}} and set timeout for that, we assume all failures after that can be properly
captured and sent back to client. So one way or another, we'll have to make assumptions. Since
timeout for {{JobSubmitted}} serves us well at the moment, maybe we should leave it as is.
A possible improvement may be to differentiate the two kinds of jobs we have: hive query job
and other jobs (e.g. addFile, getJobInfo). The former should guarantee to send back a spark
job ID for monitoring and we can set timeout for that, while the latter should finish within
constant time so we can set timeout when calling Future.get.
cc [~xuefuz] [~vanzin]

> Implement more fine-grained remote client-level events [Spark Branch]
> ---------------------------------------------------------------------
>                 Key: HIVE-8972
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-8972.1-spark.patch, HIVE-8972.2-spark.patch, HIVE-8972.3-spark.patch,
HIVE-8972.3-spark.patch, HIVE-8972.4-spark.patch
> Follow up task of HIVE-8956.
> Fine-grained events are useful for better job monitor and failure handling.

This message was sent by Atlassian JIRA

View raw message