hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7292) Hive on Spark
Date Wed, 01 Jul 2015 06:54:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609644#comment-14609644
] 

Martin Wang commented on HIVE-7292:
-----------------------------------

Hi Dear Experts,
   I'm trying Hive on Spark. I met a problem when I ran a map-only query like
   create table xxx as
          select a,b from table1 union all
          select a,b from table2 union all
          select a,b from table3 union all
          ...

   When the table number is not big, it works fine.
   When the table number is big enough, it said: 
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

   And, when the fail occurs, I can't see the job in the Spark Web UI.

   Can anyone help me to solve this problem?
   Thank you!

Martin

> Hive on Spark
> -------------
>
>                 Key: HIVE-7292
>                 URL: https://issues.apache.org/jira/browse/HIVE-7292
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>              Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
>         Attachments: Hive-on-Spark.pdf
>
>
> Spark as an open-source data analytics cluster computing framework has gained significant
momentum recently. Many Hive users already have Spark installed as their computing backbone.
To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster.
This initiative will provide user a new alternative so that those user can consolidate their
backend. 
> Secondly, providing such an alternative further increases Hive's adoption as it exposes
Spark users  to a viable, feature-rich de facto standard SQL tools on Hadoop.
> Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially
those involving multiple reducer stages, will run faster, thus improving user experience as
Tez does.
> This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached
here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message