spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-5501) Write support for the data source API
Date Tue, 03 Feb 2015 18:44:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303722#comment-14303722
] 

Yin Huai edited comment on SPARK-5501 at 2/3/15 6:43 PM:
---------------------------------------------------------

h3. Interfaces introduced to the data source API
The PR of this JIRA introduces 1 *RelationProvider* and 1 *BaseRelation*.
{code}
trait CreateableRelationProvider {
  def createRelation(
      sqlContext: SQLContext,
      parameters: Map[String, String],
      data: DataFrame): BaseRelation
}
{code}
CreateableRelationProvider is used to create a BaseRelation from a DataFrame by first storing
the data of the DataFrame in the data source and then instantiating a BaseRelation for the
data just stored in the data source (while, a RelationProvider and a SchemaRelationProvider
are only used to instantiate a BaseRelation for the existing data in the data source). CreateableRelationProvider
is used to support saving a DataFrame and CTAS queries. You can mix it in with either RelationProvider
or SchemaRelationProvider (or both) to make your data source support saving a DataFrame and
CTAS queries. Please note that createRelation should throw an exception when the specified
location for creation already exists (e.g. an exception should be thrown when we try to create
a table and save it to a HDFS path but the HDFS path already exists). 

{code}
trait InsertableRelation extends BaseRelation {
  def insert(data: DataFrame, overwrite: Boolean): Unit
}
{code}
InsertableRelation is a kind of BaseRelation that supports inert operation. You can mix it
with other BaseRelation (e.g. TableScan) to make your relation support INSERT INTO/OVERWRITE
statements(in SQL)/operations(programmatic API).


was (Author: yhuai):
h3. Interfaces introduced to the data source API
The PR of this JIRA introduces 1 *RelationProvider* and 1 *BaseRelation*.
{code}
trait CreateableRelationProvider {
  def createRelation(
      sqlContext: SQLContext,
      parameters: Map[String, String],
      data: DataFrame): BaseRelation
}
{code}
CreateableRelationProvider is used to create a BaseRelation from a DataFrame by first storing
the data of the DataFrame in the data source and then instantiating a BaseRelation for the
data just stored in the data source (while, a RelationProvider and a SchemaRelationProvider
are only used to instantiate a BaseRelation for the existing data in the data source). CreateableRelationProvider
is used to support saving a DataFrame and CTAS queries. You can mix it in with either RelationProvider
or SchemaRelationProvider (or both) to make your data source support saving a DataFrame and
CTAS queries.

{code}
trait InsertableRelation extends BaseRelation {
  def insert(data: DataFrame, overwrite: Boolean): Unit
}
{code}
InsertableRelation is a kind of BaseRelation that supports inert operation. You can mix it
with other BaseRelation (e.g. TableScan) to make your relation support INSERT INTO/OVERWRITE
statements(in SQL)/operations(programmatic API).

> Write support for the data source API
> -------------------------------------
>
>                 Key: SPARK-5501
>                 URL: https://issues.apache.org/jira/browse/SPARK-5501
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Blocker
>             Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message