drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3214) Config option to cast empty string to null does not cast empty string to null
Date Thu, 10 Dec 2015 18:35:10 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051422#comment-15051422
] 

Khurram Faraaz commented on DRILL-3214:
---------------------------------------


{code}
0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from `threeColsDouble.csv`;
+----------+---------+---------+
|  EXPR$0  | EXPR$1  | EXPR$2  |
+----------+---------+---------+
| 156      | 132     | 12222   |
| 156      | 234     | 12222   |
| 2653543  | 434     | 0       |
| 367345   | 567567  | 23      |
| 34554    | 1234    | 45      |
| 4345     | 567678  | 19876   |
| 34556    | 0       | 1109    |
| 5456     | -1      | 1098    |
| 6567     |         | 34534   |
| 7678     | 1       | 6       |
| 8798     | 456     | 243     |
| 265354   | 234     | 123     |
| 367345   |         | 234     |
| 34554    | 1       | 2       |
| 4345     | 0       | 10      |
| 34556    | -1      | 19      |
| 5456     | 23423   | 345     |
| 6567     | 0       | 2348    |
| 7678     | 1       | 2       |
| 8798     |         | 45      |
| 099      | 19      | 17      |
+----------+---------+---------+
21 rows selected (0.351 seconds)

# Set the system wide config parameter.

0: jdbc:drill:schema=dfs.tmp> alter system set `drill.exec.functions.cast_empty_string_to_null`
= true;
+-------+----------------------------------------------------------+
|  ok   |                         summary                          |
+-------+----------------------------------------------------------+
| true  | drill.exec.functions.cast_empty_string_to_null updated.  |
+-------+----------------------------------------------------------+
1 row selected (0.226 seconds)

# Empty strings are not cast to null, in the output of below query.

0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from `threeColsDouble.csv`;
+----------+---------+---------+
|  EXPR$0  | EXPR$1  | EXPR$2  |
+----------+---------+---------+
| 156      | 132     | 12222   |
| 156      | 234     | 12222   |
| 2653543  | 434     | 0       |
| 367345   | 567567  | 23      |
| 34554    | 1234    | 45      |
| 4345     | 567678  | 19876   |
| 34556    | 0       | 1109    |
| 5456     | -1      | 1098    |
| 6567     |         | 34534   |
| 7678     | 1       | 6       |
| 8798     | 456     | 243     |
| 265354   | 234     | 123     |
| 367345   |         | 234     |
| 34554    | 1       | 2       |
| 4345     | 0       | 10      |
| 34556    | -1      | 19      |
| 5456     | 23423   | 345     |
| 6567     | 0       | 2348    |
| 7678     | 1       | 2       |
| 8798     |         | 45      |
| 099      | 19      | 17      |
+----------+---------+---------+
21 rows selected (0.274 seconds)

# From the below query it is confirmed that empty strings are not cast to null. This seems
incorrect.

0: jdbc:drill:schema=dfs.tmp> select columns[1] from `threeColsDouble.csv` where columns[1]
is null;
+---------+
| EXPR$0  |
+---------+
+---------+
No rows selected (0.402 seconds)

# To confirm, there are three empty strings in columns[1] in the input CSV file. These were
not cast to null, empty strings should be cast to null, that is because we had set the system
wide config parameter.

0: jdbc:drill:schema=dfs.tmp> select columns[1] from `threeColsDouble.csv` where columns[1]='';
+---------+
| EXPR$0  |
+---------+
|         |
|         |
|         |
+---------+
3 rows selected (0.293 seconds)

{code}

> Config option to cast empty string to null does not cast empty string to null
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-3214
>                 URL: https://issues.apache.org/jira/browse/DRILL-3214
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.0.0
>         Environment: faec150598840c40827e6493992d81209aa936da
>            Reporter: Khurram Faraaz
>            Assignee: Sean Hsuan-Yi Chu
>             Fix For: 1.1.0
>
>
> Config option drill.exec.functions.cast_empty_string_to_null does not seem to be working
as designed.
> Disable casting of empty strings to null. 
> {code}
> 0: jdbc:drill:schema=dfs.tmp> alter session set `drill.exec.functions.cast_empty_string_to_null`
= false;
> +-------+----------------------------------------------------------+
> |  ok   |                         summary                          |
> +-------+----------------------------------------------------------+
> | true  | drill.exec.functions.cast_empty_string_to_null updated.  |
> +-------+----------------------------------------------------------+
> 1 row selected (0.078 seconds)
> {code}
> In this query we see empty strings are retained in query output in columns[1].
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from `threeColsDouble.csv`;
> +----------+---------+---------+
> |  EXPR$0  | EXPR$1  | EXPR$2  |
> +----------+---------+---------+
> | 156      | 234     | 12222   |
> | 2653543  | 434     | 0       |
> | 367345   | 567567  | 23      |
> | 34554    | 1234    | 45      |
> | 4345     | 567678  | 19876   |
> | 34556    | 0       | 1109    |
> | 5456     | -1      | 1098    |
> | 6567     |         | 34534   |
> | 7678     | 1       | 6       |
> | 8798     | 456     | 243     |
> | 265354   | 234     | 123     |
> | 367345   |         | 234     |
> | 34554    | 1       | 2       |
> | 4345     | 0       | 10      |
> | 34556    | -1      | 19      |
> | 5456     | 23423   | 345     |
> | 6567     | 0       | 2348    |
> | 7678     | 1       | 2       |
> | 8798     |         | 45      |
> | 099      | 19      | 17      |
> +----------+---------+---------+
> 20 rows selected (0.13 seconds)
> {code}
> Casting empty strings to integer leads to NumberFormatException
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], cast(columns[1] as int), columns[2]
from `threeColsDouble.csv`;
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Fragment 0:0
> [Error Id: b08f4247-263a-460d-b37b-91a70375f7ba on centos-03.qa.lab:31010] (state=,code=0)
> {code}
> Enable casting empty string to null.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> alter session set `drill.exec.functions.cast_empty_string_to_null`
= true;
> +-------+----------------------------------------------------------+
> |  ok   |                         summary                          |
> +-------+----------------------------------------------------------+
> | true  | drill.exec.functions.cast_empty_string_to_null updated.  |
> +-------+----------------------------------------------------------+
> 1 row selected (0.077 seconds)
> {code}
> Run query
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], cast(columns[1] as int), columns[2]
from `threeColsDouble.csv`;
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Fragment 0:0
> [Error Id: de633399-15f9-4a79-a21f-262bd5551207 on centos-03.qa.lab:31010] (state=,code=0)
> {code}
> Note from the output of below query that the empty strings are not casted to null, although
drill.exec.functions.cast_empty_string_to_null was set to true.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from `threeColsDouble.csv`;
> +----------+---------+---------+
> |  EXPR$0  | EXPR$1  | EXPR$2  |
> +----------+---------+---------+
> | 156      | 234     | 12222   |
> | 2653543  | 434     | 0       |
> | 367345   | 567567  | 23      |
> | 34554    | 1234    | 45      |
> | 4345     | 567678  | 19876   |
> | 34556    | 0       | 1109    |
> | 5456     | -1      | 1098    |
> | 6567     |         | 34534   |
> | 7678     | 1       | 6       |
> | 8798     | 456     | 243     |
> | 265354   | 234     | 123     |
> | 367345   |         | 234     |
> | 34554    | 1       | 2       |
> | 4345     | 0       | 10      |
> | 34556    | -1      | 19      |
> | 5456     | 23423   | 345     |
> | 6567     | 0       | 2348    |
> | 7678     | 1       | 2       |
> | 8798     |         | 45      |
> | 099      | 19      | 17      |
> +----------+---------+---------+
> 20 rows selected (0.125 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message