spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-30643) Add support for embedding Hive 3
Date Mon, 27 Jan 2020 02:37:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024062#comment-17024062
] 

Dongjoon Hyun edited comment on SPARK-30643 at 1/27/20 2:36 AM:
----------------------------------------------------------------

It sounds like a misunderstanding on the role of embedded Hive. It's just used to talk Hive
metastore.
{quote}But if I chose to run Hive 3 and Spark with embedded Hive 2.3, then SparkSQL and Hive
queries behavior could differ in some cases.
{quote}
Everything (SQL Parser/Analyzer/Optimizer and execution engine) are Spark's own code. So,
in general, the embedded Hive 1.2/2.3 doesn't make a different. The exceptional cases might
be Hive bugs. For example, Spark 3.0.0 will ship with Hive 1.2 and Hive 2.3 (default), and
all UTs passed in both environment with same results.

For the following, I don't think Apache Spark need to have Hive 1.2/2.3/3.1 in Apache Spark
3.x era. Adding 2.3 took away so much efforts from Apache Spark community, so it couldn't
happen in Apache Spark 2.x. Maybe, we can revisit this issue for Apache Spark 4.0 if there
is many users who running Hive 3.x in the production stably (not beta.)
{quote}I think that majority of reasons that went into support of embedding Hive 2.3 will
apply to support of embedding Hive 3.
{quote}


was (Author: dongjoon):
It sounds like a misunderstanding on the role of embedded Hive. It's just used to talk Hive
metastore.
{quote}But if I chose to run Hive 3 and Spark with embedded Hive 2.3, then SparkSQL and Hive
queries behavior could differ in some cases.
{quote}
Everything (SQL Parser/Analyzer/Optimizer and execution engine) are Spark's own code. So,
in general, the embedded Hive 1.2/2.3 doesn't make a different. The exceptional cases might
be Hive bugs. For example, Spark 3.0.0 will ship with Hive 1.2 and Hive 2.3 (default), and
all UTs passed in both environment with same results.

For the following, I don't think Apache Spark need to have Hive 1.2 and Hive 2.3 and 3.1 in
Apache Spark 3.x era. Adding 2.3 took away too many efforts from Apache Spark community, so
it couldn't happen in Apache Spark 2.x. Maybe, we can consider that for Apache Spark 4.0 if
there is many users who running Hive 3.x in the production stably (not beta.)
{quote}I think that majority of reasons that went into support of embedding Hive 2.3 will
apply to support of embedding Hive 3.
{quote}

> Add support for embedding Hive 3
> --------------------------------
>
>                 Key: SPARK-30643
>                 URL: https://issues.apache.org/jira/browse/SPARK-30643
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Igor Dvorzhak
>            Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, compilation fails
against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message