spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mathieu longtin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-24753) bad backslah parsing in SQL statements
Date Tue, 10 Jul 2018 18:00:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539023#comment-16539023
] 

mathieu longtin edited comment on SPARK-24753 at 7/10/18 5:59 PM:
------------------------------------------------------------------

Thanks for the response. Yes, it does work with escapedStringLiterals.

However, this is an inconsitent behavior. In the doc example:
 ([https://spark.apache.org/docs/2.3.0/api/sql/index.html#rlike])

 
{code:java}
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\\Users.*'
{code}
The examples are totally wrong. In fact, they produce an error. To reproduce the example,
using the *spark-sql* command with _escapedStringLiterals=False_, I need this:

 
{code:java}
When spark.sql.parser.escapedStringLiterals is disabled (default). 
> SELECT '%SystemDrive%\\Users\\John' rlike '%SystemDrive%\\\\Users.*' 
true{code}
Notice the double and quadruple backslash. Somehow, the right side of rlike gets decoded,
and then passed to the rlike function, which then decodes it again.

BTW, from spark-sql, single backslash are no good:
{code:java}
> SELECT '%SystemDrive%\Users\John' ;
%SystemDrive%UsersJohn
{code}
Oops, the backslash get swallowed.

 

 


was (Author: mathieulongtin):
Thanks for the response. Yes, it does work with escapedStringLiterals.

However, this is an inconsitent behavior. In the doc example:
(https://spark.apache.org/docs/2.3.0/api/sql/index.html#rlike)

 
{code:java}
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\\Users.*'
{code}
The examples are totally wrong. In fact, they produce an error. To reproduce the example,
using the *spark-sql* command with _escapedStringLiterals=False_, I need this:

 
{code:java}
When spark.sql.parser.escapedStringLiterals is disabled (default). 
> SELECT '%SystemDrive%\\Users\\John' rlike '%SystemDrive%\\\\Users.*' 
true{code}
Notice the double and quadruple backslash. Somehow, the right side of rlike gets decoded,
and then passed to the rlike function, which then decodes it again.

BTW, from spark-sql:

 
{code:java}
> SELECT '%SystemDrive%\Users\John' ;

%SystemDrive%UsersJohn
{code}
 

Oops, the backslash get swallowed.

 

 

> bad backslah parsing in SQL statements
> --------------------------------------
>
>                 Key: SPARK-24753
>                 URL: https://issues.apache.org/jira/browse/SPARK-24753
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment:       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
>       /_/
> Using Python version 2.7.12 (default, Jul 15 2016 11:23:12)
>            Reporter: mathieu longtin
>            Priority: Minor
>
> When putting backslashes in SQL code, you need to double them (or rather double double
them).
> Code in Python but I verified the problem is the same in Scala.
> Line  [3] should return the line, and line 4 shouldn't.
>  
> {code:java}
> In [1]: df = spark.createDataFrame([("abc def ghi",)], schema=["s"])
> In [2]: df.filter(df.s.rlike('\\bdef\\b')).show()
> +-----------+
> |          s|
> +-----------+
> |abc def ghi|
> +-----------+
> In [3]: df.filter("s rlike '\\bdef\\b'").show()
> +---+
> |  s|
> +---+
> +---+
> In [4]: df.filter("s rlike '\\\\bdef\\\\b'").show()
> +-----------+
> |          s|
> +-----------+
> |abc def ghi|
> +-----------+
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message