spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23521) SPIP: Standardize SQL logical plans with DataSourceV2
Date Tue, 27 Feb 2018 01:53:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ryan Blue updated SPARK-23521:
------------------------------
    Description: 
Executive Summary: This SPIP is based on [discussion about the DataSourceV2 implementation|https://lists.apache.org/thread.html/55676ec1f5039d3deaf347d391cf82fe8574b8fa4eeab70110ed5b2b@%3Cdev.spark.apache.org%3E]
on the dev list. The proposal is to standardize the logical plans used for write operations to
make the planner more maintainable and to make Spark's write behavior predictable and reliable.
It proposes the following principles:
 # Use well-defined logical plan nodes for all high-level operations: insert, create, CTAS,
overwrite table, etc.
 # Use planner rules that match on these high-level nodes, so that it isn’t necessary to
create rules to match each eventual code path individually.
 # Clearly define Spark’s behavior for these logical plan nodes. Physical nodes should implement
that behavior so that all code paths eventually make the same guarantees.
 # Specialize implementation when creating a physical plan, not logical plans. This will avoid
behavior drift and ensure planner code is shared across physical implementations.

The SPIP doc presents a small but complete set of those high-level logical operations, most
of which are already defined in SQL or implemented by some write path in Spark.

  was:
Executive Summary: This SPIP is based on [discussion about the DataSourceV2 implementation|https://lists.apache.org/thread.html/55676ec1f5039d3deaf347d391cf82fe8574b8fa4eeab70110ed5b2b@%3Cdev.spark.apache.org%3E]
on the dev list. The proposal is to standardize the logical plans used for write operations to
make the planner more maintainable and to make Spark's write behavior predictable and reliable.

The SPIP doc presents a small set of operations, most of which are already defined in SQL
or implemented by some write path in Spark. This set is complete enough to handle all the
write cases.


> SPIP: Standardize SQL logical plans with DataSourceV2
> -----------------------------------------------------
>
>                 Key: SPARK-23521
>                 URL: https://issues.apache.org/jira/browse/SPARK-23521
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Ryan Blue
>            Priority: Major
>              Labels: SPIP
>
> Executive Summary: This SPIP is based on [discussion about the DataSourceV2 implementation|https://lists.apache.org/thread.html/55676ec1f5039d3deaf347d391cf82fe8574b8fa4eeab70110ed5b2b@%3Cdev.spark.apache.org%3E]
on the dev list. The proposal is to standardize the logical plans used for write operations to
make the planner more maintainable and to make Spark's write behavior predictable and reliable.
It proposes the following principles:
>  # Use well-defined logical plan nodes for all high-level operations: insert, create,
CTAS, overwrite table, etc.
>  # Use planner rules that match on these high-level nodes, so that it isn’t necessary
to create rules to match each eventual code path individually.
>  # Clearly define Spark’s behavior for these logical plan nodes. Physical nodes should
implement that behavior so that all code paths eventually make the same guarantees.
>  # Specialize implementation when creating a physical plan, not logical plans. This will
avoid behavior drift and ensure planner code is shared across physical implementations.
> The SPIP doc presents a small but complete set of those high-level logical operations,
most of which are already defined in SQL or implemented by some write path in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message