spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravindra Nath Kakarla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24260) Support for multi-statement SQL in SparkSession.sql API
Date Tue, 15 May 2018 11:35:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475687#comment-16475687
] 

Ravindra Nath Kakarla commented on SPARK-24260:
-----------------------------------------------

Can we override the return value the results of last query? This feature is available in some
databases like Mysql.

> Support for multi-statement SQL in SparkSession.sql API
> -------------------------------------------------------
>
>                 Key: SPARK-24260
>                 URL: https://issues.apache.org/jira/browse/SPARK-24260
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Ravindra Nath Kakarla
>            Priority: Minor
>
> sparkSession.sql API only supports a single SQL statement to be executed for a call.
A multi-statement SQL cannot be executed in a single call. For example,
> {code:java}
> SparkSession sparkSession = SparkSession.builder().appName("MultiStatementSQL")   
                                      .master("local").config("", "").getOrCreate()
> sparkSession.sql("DROP TABLE IF EXISTS count_employees; CACHE TABLE employees; CREATE
TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM employees; SELECT * FROM count_employees") 
> {code}
> Above code fails with the error, 
> {code:java}
> org.apache.spark.sql.catalyst.parser.ParseException: mismatched input ';' expecting <EOF>{code}
> Solution to this problem is to use the .sql API multiple times in a specific order.
> {code:java}
> sparkSession.sql("DROP TABLE IF EXISTS count_employees")
> sparkSession.sql("CACHE TABLE employees")
> sparkSession.sql("CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM
employees;")
> sparkSession.sql("SELECT * FROM count_employees")
> {code}
> If these SQL statements come from a string / file, users have to implement their own
parsers to execute this. Like,
> {code:java}
> val sqlFromFile = """DROP TABLE IF EXISTS count_employees;
>  |CACHE TABLE employees;
>  |CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM employees; SELECT *
FROM count_employees""".stripMargin{code}
> {code:java}
> sqlFromFile.split(";")
> .forEach(line => sparkSession.sql(line))
> {code}
> This naive parser can fail for many edge cases (like ";" inside a string). Even if users
use the same grammar used by Spark and implement their own parsing, it can go out of sync
with the way Spark parses the statements.
> Can support for multiple SQL statements be built into SparkSession.sql API itself?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message