flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10257) Incorrect CHAR type support in Flink SQL and Table API
Date Wed, 12 Sep 2018 11:02:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611913#comment-16611913

Fabian Hueske commented on FLINK-10257:

Thanks for looking into this and driving it forward [~pnowojski]!

> Incorrect CHAR type support in Flink SQL and Table API
> ------------------------------------------------------
>                 Key: FLINK-10257
>                 URL: https://issues.apache.org/jira/browse/FLINK-10257
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API &amp; SQL
>            Reporter: Piotr Nowojski
>            Assignee: Hequn Cheng
>            Priority: Critical
> Despite that we officially do not support CHAR type, this type is visible and accessible
for the users. First of all, string literals have default type of CHAR in SQL. Secondly users
can always cast expressions/columns to CHAR.
> Problem is that we do not support CHAR correctly. We mishandle it in:
>  # comparisons and functions
>  # writing values to sinks
> According to SQL standard (and as almost all of the other databases do), CHAR comparisons
should ignore white spaces. On the other hand, functions like {{CONCAT}} or {{LENGTH}} shouldn't:
[http://troels.arvin.dk/db/rdbms/#data_types-char] .
> Currently in In Flink we completely ignore those rules. Sometimes we store internally
CHAR with padded spaces sometimes without. This results with semi random behaviour with respect
to comparisons/functions/writing to sinks. For example following query:
> {code:java}
> tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM sourceTable").insertInto("targetTable")
> env.execute()
> {code}
> Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi", "Hello", "Hello
world", writes to sink not padded strings (correctly), but following query:
> {code:java}
> tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable) WHERE c
= 'Hi'")
>   .insertInto("targetTable")
> env.execute(){code}
> Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}} is a NOOP
in Flink, while 'Hi' constant handed by Calcite to us will be padded with 8 spaces.
> On the other hand following query produces strings padded with spaces:
> {code:java}
> tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE 'this should
not happen' END FROM sourceTable")
>   .insertInto("targetTable")
> env.execute()
> val expected = Seq(
>   "GERMANY",
>   "POLAND",
>   "POLAND").mkString("\n")
> org.junit.ComparisonFailure: Different elements in arrays: expected 3 elements and received
> received: [GERMANY , POLAND , POLAND ]
> {code}
> To make matter even worse, Calcite's constant folding correctly performs comparisons,
while if same comparisons are performed by Flink, they yield different results. In other words
in SQL:
> {code:java}
> {code}
> return true, but same expression performed on columns
> {code:java}
> SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM countries{code}
> returns false.
> To further complicated things, in SQL our string literals have {{CHAR}} type, while
in Table API our literals have String type (effectively {{VARCHAR}}) making results inconsistent
between those two APIs.
> CC [~twalthr] [~fhueske] [~hequn8128]

This message was sent by Atlassian JIRA

View raw message