drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2790) Implicit cast to numbers fails when Text data contains empty strings
Date Thu, 02 Jul 2015 21:04:05 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612517#comment-14612517
] 

Jinfeng Ni commented on DRILL-2790:
-----------------------------------

The feature to ignore illegal formated number string seems to be non-SQL standard. In any
SQL system, if you have empty string, or any string which could not converted into a number,
yet you try to cast it into a number, you will hit NumberFormatException, or similar error.

For instance, Postgres:

{code}
create table t (col varchar(30));

insert into t values ('10');

insert into t values ('');

select * from t;
 col
-----
 10

(2 rows)
{code}

First of all, if you try to compare the string with a number in a filter, Postgres will not
allow you to do that.
{code}
select * from t where col = 10;
ERROR:  operator does not exist: character varying = integer
{code}

Even if you explicitly do a cast from string to integer, it also will fail:
{code}
 select * from t where cast(col as int) = 10;
ERROR:  invalid input syntax for integer: ""
{code}

Given that, I change this JIRA to "New Improvement", and target for "future" for now.


> Implicit cast to numbers fails when Text data contains empty strings 
> ---------------------------------------------------------------------
>
>                 Key: DRILL-2790
>                 URL: https://issues.apache.org/jira/browse/DRILL-2790
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0
>            Reporter: Abhishek Girish
>            Assignee: Jinfeng Ni
>            Priority: Minor
>             Fix For: Future
>
>
> When a column in a Text file (CSV) contains empty strings, any query which uses valid
implicit casts fail to execute with NumberFormatException. Drill should ignore empty strings
in such scenarios. If not, the error message must be improved to clearly indicate the same.

> *Text:*
> {code:sql}
> > select columns[4] from `store.dat` limit 10;
> +------------+
> |   EXPR$0   |
> +------------+
> | 2451189    |
> |            |
> |            |
> | 2451044    |
> | 2450910    |
> |            |
> |            |
> |            |
> |            |
> |            |
> +------------+
> 10 rows selected (0.154 seconds)
> > select * from `store.dat` where columns[4] = 2451189 limit 1;
> Query failed: RemoteRpcException: Failure while running fragment.,  [ e5348e11-ec53-4332-981a-ff070253c19a
on abhi6.qa.lab:31010 ]
> [ e5348e11-ec53-4332-981a-ff070253c19a on abhi6.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. (state=,code=0)
> {code}
> Log snippet:
> {code}
> 2015-04-14 14:16:50,642 [2ad27c3c-8b36-a116-4d38-18410eff96d2:frag:0:0] ERROR o.a.drill.exec.ops.FragmentContext
- Fragment Context received failure -- Fragment: 0:0
> java.lang.NumberFormatException:
>         at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:97)
~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
>         at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToInt(StringFunctionHelpers.java:103)
~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.FiltererGen97.doEval(FilterTemplate2.java:37)
~[na:na]
>         at org.apache.drill.exec.test.generated.FiltererGen97.filterBatchNoSV(FilterTemplate2.java:98)
~[na:na]
>         at org.apache.drill.exec.test.generated.FiltererGen97.filterBatch(FilterTemplate2.java:71)
~[na:na]
>         at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:82)
~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> {code}
> The following query works:
> {code:sql}
> > select columns[4] from `store.dat` where columns[4] <>'' AND columns[4]= 2451189
limit 1;
> +------------+
> |   EXPR$0   |
> +------------+
> | 2451189    |
> +------------+
> 1 row selected (0.174 seconds)
> {code}
> *Parquet:*
> {code:sql}
> > select s_closed_date_sk from store where s_closed_date_sk = 2451189 limit 1;
> +------------------+
> | s_closed_date_sk |
> +------------------+
> | 2451189          |
> +------------------+
> 1 row selected (0.122 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message