Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 246BD200BB4 for ; Tue, 1 Nov 2016 23:01:02 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 230DE160B02; Tue, 1 Nov 2016 22:01:02 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 71C61160ADA for ; Tue, 1 Nov 2016 23:01:01 +0100 (CET) Received: (qmail 40819 invoked by uid 500); 1 Nov 2016 22:01:00 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 40795 invoked by uid 99); 1 Nov 2016 22:01:00 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2016 22:01:00 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 69AB12C1F56 for ; Tue, 1 Nov 2016 22:01:00 +0000 (UTC) Date: Tue, 1 Nov 2016 22:01:00 +0000 (UTC) From: "Reynold Xin (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-18209) More robust view canonicalization without full SQL expansion MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 01 Nov 2016 22:01:02 -0000 [ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-18209: -------------------------------- Priority: Critical (was: Major) > More robust view canonicalization without full SQL expansion > ------------------------------------------------------------ > > Key: SPARK-18209 > URL: https://issues.apache.org/jira/browse/SPARK-18209 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Reynold Xin > Priority: Critical > > Spark SQL currently stores views by analyzing the provided SQL and then generating fully expanded SQL out of the analyzed logical plan. This is actually a very error prone way of doing it, because: > 1. It is non-trivial to guarantee that the generated SQL is correct without being extremely verbose, given the current set of operators. > 2. We need extensive testing for all combination of operators. > 3. Whenever we introduce a new logical plan operator, we need to be super careful because it might break SQL generation. This is the main reason broadcast join hint has taken forever to be merged because it is very difficult to guarantee correctness. > Given the two primary reasons to do view canonicalization is to provide the context for the database as well as star expansion, I think we can this through a simpler approach, by taking the user given SQL, analyze it, and just wrap the original SQL with a SELECT clause at the outer and store the database as a hint. > For example, given the following view creation SQL: > {code} > USE DATABASE my_db; > CREATE TABLE my_table (id int, name string); > CREATE VIEW my_view AS SELECT * FROM my_table WHERE id > 10; > {code} > We store the following SQL instead: > {code} > SELECT /*+ current_db: `my_db` */ id, name FROM (SELECT * FROM my_table WHERE id > 10); > {code} > During parsing time, we expand the view along using the provided database context. > (We don't need to follow exactly the same hint, as I'm merely illustrating the high level approach here.) > Note that there is a chance that the underlying base table(s)' schema change and the stored schema of the view might differ from the actual SQL schema. In that case, I think we should throw an exception at runtime to warn users. This exception can be controlled by a flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org