hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-16984) HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver died
Date Wed, 28 Jun 2017 23:15:02 GMT


Hive QA commented on HIVE-16984:

Here are the results of testing the latest attachment:

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10829 tests executed
*Failed tests:*
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=233)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=178)

Test results:
Console output:
Test logs:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed

This message is automatically generated.

ATTACHMENT ID: 12874946 - PreCommit-HIVE-Build

> HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver died
> -------------------------------------------------------------------------------
>                 Key: HIVE-16984
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-16984.1.patch
> In HoS, after a RemoteDriver is launched, it may fail to initialize a Spark context and
thus the ApplicationMaster will die eventually. In this case, there are two issues related
to RemoteSparkJobStatus::getAppID():
> 1. Currently we call {{getAppID()}} before starting the monitoring job. For the first,
it will wait for {{hive.spark.client.future.timeout}}, and for the latter, it will wait for
{{hive.spark.job.monitor.timeout}}. The error message for the latter treats the {{hive.spark.job.monitor.timeout}}
as the time waiting for the job submission. However, this is inaccurate as it doesn't include
> 2. In case the RemoteDriver suddenly died, currently we still may wait hopelessly for
the timeouts. This should potentially be avoided if we know that the channel has closed between
the client and remote driver.

This message was sent by Atlassian JIRA

View raw message