drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4842) SELECT * on JSON data results in NumberFormatException
Date Sat, 11 Nov 2017 00:08:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248214#comment-16248214
] 

Paul Rogers commented on DRILL-4842:
------------------------------------

The JSON reader has been updated in the "Batch Size Control" project. A new feature was added
to handle runs of nulls.

If the first value seen for a field is null, then we enter "deferred null mode."

While in that mode, we do not create a value vector; instead we bide our time waiting for
an actual value.

When we see an actual value, we create a vector of that type. The vector automatically performs
"fill empties" logic for the missing null values.

If we reach the end of a batch, and we've seen no value, then we assume "text mode" for this
field. That is, this field (only) acts as if "all text mode" were set. The field becomes a
nullable Varchar.

Once we see a scalar value, that value is saved as text. So, if we have 10K nulls, followed
by the value 10, the value will be saved as the string "10".

If the value ends up being an array or object, then the parse fails with a type mismatch error.

The above is only a work-around. The proper solution is to allow the user to specify a schema
so that Drill need not try to predict the future.

> SELECT * on JSON data results in NumberFormatException
> ------------------------------------------------------
>
>                 Key: DRILL-4842
>                 URL: https://issues.apache.org/jira/browse/DRILL-4842
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>            Reporter: Khurram Faraaz
>            Assignee: Serhii Harnyk
>         Attachments: tooManyNulls.json
>
>
> Note that doing SELECT c1 returns correct results, the failure is seen when we do SELECT
star. json.all_text_mode was set to true.
> JSON file tooManyNulls.json has one key c1 with 4096 nulls as its value and the 4097th
key c1 has the value "Hello World"
> git commit ID : aaf220ff
> MapR Drill 1.8.0 RPM
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true;
> +-------+------------------------------------+
> |  ok   |              summary               |
> +-------+------------------------------------+
> | true  | store.json.all_text_mode updated.  |
> +-------+------------------------------------+
> 1 row selected (0.27 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT c1 FROM `tooManyNulls.json` WHERE c1 IN ('Hello
World');
> +--------------+
> |      c1      |
> +--------------+
> | Hello World  |
> +--------------+
> 1 row selected (0.243 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN ('Hello
World');
> Error: SYSTEM ERROR: NumberFormatException: Hello World
> Fragment 0:0
> [Error Id: 9cafb3f9-3d5c-478a-b55c-900602b8765e on centos-01.qa.lab:31010]
>  (java.lang.NumberFormatException) Hello World
>     org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
>     org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt():120
>     org.apache.drill.exec.test.generated.FiltererGen1169.doSetup():45
>     org.apache.drill.exec.test.generated.FiltererGen1169.setup():54
>     org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():195
>     org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.NumberFormatException: Hello World
>         at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:95)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt(StringFunctionHelpers.java:120)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.FiltererGen1169.doSetup(FilterTemplate2.java:45)
~[na:na]
>         at org.apache.drill.exec.test.generated.FiltererGen1169.setup(FilterTemplate2.java:54)
~[na:na]
>         at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:195)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema(FilterRecordBatch.java:107)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message