spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23443) Spark with Glue as external catalog
Date Wed, 18 Apr 2018 17:56:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442948#comment-16442948
] 

Xiao Li commented on SPARK-23443:
---------------------------------

We need to clean the ExternalCatalog interface at first before the catalog federation. 

> Spark with Glue as external catalog
> -----------------------------------
>
>                 Key: SPARK-23443
>                 URL: https://issues.apache.org/jira/browse/SPARK-23443
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Ameen Tayyebi
>            Priority: Major
>
> AWS Glue Catalog is an external Hive metastore backed by a web service. It allows permanent
storage of catalog data for BigData use cases.
> To find out more information about AWS Glue, please consult:
>  * AWS Glue - [https://aws.amazon.com/glue/]
>  * Using Glue as a Metastore catalog for Spark - [https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html]
> Today, the integration of Glue and Spark is through the Hive layer. Glue implements the
IMetaStore interface of Hive and for installations of Spark that contain Hive, Glue can be
used as the metastore.
> The feature set that Glue supports does not align 1-1 with the set of features that the
latest version of Spark supports. For example, Glue interface supports more advanced partition
pruning that the latest version of Hive embedded in Spark.
> To enable a more natural integration with Spark and to allow leveraging latest features
of Glue, without being coupled to Hive, a direct integration through Spark's own Catalog API
is proposed. This Jira tracks this work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message