hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7527) Support order by and sort by on Spark
Date Mon, 04 Aug 2014 15:22:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084763#comment-14084763
] 

Hive QA commented on HIVE-7527:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12659646/HIVE-7527-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5826 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/10/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/10/console
Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-10/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12659646

> Support order by and sort by on Spark
> -------------------------------------
>
>                 Key: HIVE-7527
>                 URL: https://issues.apache.org/jira/browse/HIVE-7527
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>         Attachments: HIVE-7527-spark.patch
>
>
> Currently Hive depends completely on MapReduce's sorting as part of shuffling to achieve
order by (global sort, one reducer) and sort by (local sort).
> Spark has a sort by transformation in different variations that can used to support Hive's
order by and sort by. However, we still need to evaluate weather Spark's sortBy can achieve
the same functionality inherited from MapReduce's shuffle sort.
> Currently Hive on Spark should be able to run simple sort by or order by, by changing
the currently partitionBy to sortby. This is the way to verify theories. Complete solution
will not be available until we have complete SparkPlanGenerator.
> There is also a question of how we determine that there is order by or sort by by just
looking at the operator tree, from which Spark task is created. This is the responsibility
of SparkPlanGenerator, but we need to have an idea.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message