spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Kessler (JIRA)" <>
Subject [jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
Date Mon, 21 Dec 2015 12:46:46 GMT


Stephan Kessler commented on SPARK-12449:

Added the design document. Looking forward for the discussion, if there is an agreement i
am happy to create sub tasks and implement things.

> Pushing down arbitrary logical plans to data sources
> ----------------------------------------------------
>                 Key: SPARK-12449
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Stephan Kessler
>         Attachments: pushingDownLogicalPlans.pdf
> With the help of the DataSource API we can pull data from external sources for processing.
Implementing interfaces such as {{PrunedFilteredScan}} allows to push down filters and projects
pruning unnecessary fields and rows directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more preprocessing,
e.g., evaluating aggregates. This is beneficial because it would reduce the amount of data
transferred from the source to Spark. The existing interfaces do not allow such kind of processing
in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to defer the processing
of arbitrary logical plans to the data source. We have already shown the details at the Spark
Summit 2015 Europe []
> I will add a design document explaining details. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message