Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E0DB118FBA for ; Wed, 23 Dec 2015 19:15:46 +0000 (UTC) Received: (qmail 87412 invoked by uid 500); 23 Dec 2015 19:15:46 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 87316 invoked by uid 500); 23 Dec 2015 19:15:46 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 87289 invoked by uid 99); 23 Dec 2015 19:15:46 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Dec 2015 19:15:46 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9FCA92C1F54 for ; Wed, 23 Dec 2015 19:15:46 +0000 (UTC) Date: Wed, 23 Dec 2015 19:15:46 +0000 (UTC) From: "Santiago M. Mola (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070062#comment-15070062 ] Santiago M. Mola commented on SPARK-12449: ------------------------------------------ The physical plan would not be consumed by data sources, only the logical plan. An alternative approach would be to use a different representation to pass the logical plan to the data source. If the relational algebra from Apache Calcite is stable enough, it could be used as the logical plan representation for this interface. > Pushing down arbitrary logical plans to data sources > ---------------------------------------------------- > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Stephan Kessler > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows to push down filters and projects pruning unnecessary fields and rows directly in the data source. > However, data sources such as SQL Engines are capable of doing even more preprocessing, e.g., evaluating aggregates. This is beneficial because it would reduce the amount of data transferred from the source to Spark. The existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to defer the processing of arbitrary logical plans to the data source. We have already shown the details at the Spark Summit 2015 Europe [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org