spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Agarwal (JIRA)" <>
Subject [jira] [Updated] (SPARK-16217) Support SELECT INTO statement
Date Mon, 08 Jan 2018 20:47:00 GMT


Sameer Agarwal updated SPARK-16217:
    Target Version/s: 2.4.0  (was: 2.3.0)

> Support SELECT INTO statement
> -----------------------------
>                 Key: SPARK-16217
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: GuangFancui(ISCAS)
> The *SELECT INTO* statement selects data from one table and inserts it into a new table
as follows.
> {code:sql}
> SELECT column_name(s)
> INTO newtable
> FROM table1;
> {code}
> This statement is commonly used in SQL but not currently supported in SparkSQL.
> We investigated the Catalyst and found that this statement can be implemented by improving
the grammar and reusing the logical plan of *CREAT TABLE AS SELECT* as follows.
> # Improve grammar: Add _intoClause_ to _SELECT ... FROM_ in _querySpecification_ grammar
in SqlBase.g4 file.
> !!
> For example
>  {code:sql}
> {code}
> Then the grammar tree will be: 
> !!
> Furthermore, we can argue whether it's necessary to add _intoCaluse_ to _TRANSFORM_ in
> # Identify _SELECT INTO_ in _Parser_: Modify _visitSingleInsertQuery_ function. Extract
_IntoClauseContext_ with _existIntoClause_ fucntion. _IntoClauseContext_ is then passed as
an argument to _withSelectInto_ function .(_intoClause_ and queryOrganization are not in the
same level, so we need to extract _IntoClauseContext_ when visiting _singleInsertQuery_)
> # Conversion in _Parser_: Convert current logical plan to _CTAS_(Strictly speaking, as
a child of CTAS) using _withSelectInto_ function. 
> *Hive support* should be opened since _CreateHiveTableAsSelectCommand_ relies on it.
> _withSelectInto_ function copies code of _visitCreateTable_ to do conversion. So it requires
 further discussion and optimization.
> Implements are based on the following _assumptions_:
> # _intoClause_ must be together with _fromClause_.{code:sql}(intoClause? fromClause)?{code}This
structure can ensure that this modification won’t affect existed _multiInsertQuery_.
> # _SELECT INOT_ statement will be translated to  the following tree structure:
> !!
> As shown, if there is a _intoClause_, the actual subclass of _queryTerm_ is _queryTermDefault_,
besides, the actual subclass of _queryPrimary_ is _queryPrimaryDefault_. We use _existIntoClause_
 function to match designated subclass. Only all conditions are satisfied can this function
return intoClauseContext, if not, return null.
> We’ve implemented and tested the above approach. Please refer to PR:

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message