spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-15691) Refactor and improve Hive support
Date Wed, 01 Jun 2016 06:01:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin updated SPARK-15691:
--------------------------------
    Description: 
Hive support is important to Spark SQL, as many Spark users use it to read from Hive. The
current architecture is very difficult to maintain, and this ticket tracks progress towards
getting us to a sane state.

A number of things we want to accomplish are:

- Remove HiveSessionCatalog. All Hive-related stuff should go into HiveExternalCatalog. This
would require moving caching either into HiveExternalCatalog, or just into SessionCatalog.
- Move the Hive specific catalog logic (e.g. using properties to store data source options)
into HiveExternalCatalog.
- Remove HIve's specific ScriptTransform implementation and make it more general so we can
put it in sql/core.
- Implement HiveTableScan (and write path) as a data source, so we don't need a special planner
rule for HiveTableScan.
- Remove HiveSharedState and HiveSessionState.



  was:
Hive support is important to Spark SQL, as many Spark users use it to read from Hive. The
current architecture is very difficult to maintain, and this ticket tracks progress towards
getting us to a sane state.

A number of things we want to 




> Refactor and improve Hive support
> ---------------------------------
>
>                 Key: SPARK-15691
>                 URL: https://issues.apache.org/jira/browse/SPARK-15691
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Reynold Xin
>
> Hive support is important to Spark SQL, as many Spark users use it to read from Hive.
The current architecture is very difficult to maintain, and this ticket tracks progress towards
getting us to a sane state.
> A number of things we want to accomplish are:
> - Remove HiveSessionCatalog. All Hive-related stuff should go into HiveExternalCatalog.
This would require moving caching either into HiveExternalCatalog, or just into SessionCatalog.
> - Move the Hive specific catalog logic (e.g. using properties to store data source options)
into HiveExternalCatalog.
> - Remove HIve's specific ScriptTransform implementation and make it more general so we
can put it in sql/core.
> - Implement HiveTableScan (and write path) as a data source, so we don't need a special
planner rule for HiveTableScan.
> - Remove HiveSharedState and HiveSessionState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message