hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7958) SparkWork generated by SparkCompiler may require multiple Spark jobs to run
Date Thu, 04 Sep 2014 23:51:23 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122192#comment-14122192
] 

Hive QA commented on HIVE-7958:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666605/HIVE-7958-spark.patch

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 6291 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/console
Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-112/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666605

> SparkWork generated by SparkCompiler may require multiple Spark jobs to run
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-7958
>                 URL: https://issues.apache.org/jira/browse/HIVE-7958
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>            Priority: Critical
>              Labels: Spark-M1
>         Attachments: HIVE-7958-spark.patch
>
>
> A SparkWork instance currently may contain disjointed work graphs. For instance, union_remove_1.q
may generated a plan like this:
> {code}
> Reduce2 <- Map 1
> Reduce4 <- Map 3
> {code}
> The SparkPlan instance generated from this work graph contains two result RDDs. When
such plan is executed, we call .foreach() on the two RDDs sequentially, which results two
Spark jobs, one after the other.
> While this works functionally, the performance will not be great as the Spark jobs are
run sequentially rather than concurrently.
> Another side effect of this is that the corresponding SparkPlan instance is over-complicated.
> The are two potential approaches:
> 1. Let SparkCompiler generate a work that can be executed in ONE Spark job only. In above
example, two Spark task should be generated.
> 2. Let SparkPlanGenerate generate multiple Spark plans and then SparkClient executes
them concurrently.
> Approach #1 seems more reasonable and naturally fit to our architecture. Also, Hive's
task execution framework already takes care of the task concurrency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message