drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5769) IndexOutOfBoundsException when querying JSON files
Date Fri, 08 Sep 2017 00:44:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157935#comment-16157935
] 

David Lee commented on DRILL-5769:
----------------------------------

My original problem was encountered when running a select * on a single 5 gig JSON file with
a mix of nested keys and arrays to convert it to parquet.. Splitting that file into 10 smaller
files worked to create 10 parquet files but then the same technique failed on a different
6 gig JSON since I have no control when an empty array may show up.

> IndexOutOfBoundsException when querying JSON files
> --------------------------------------------------
>
>                 Key: DRILL-5769
>                 URL: https://issues.apache.org/jira/browse/DRILL-5769
>             Project: Apache Drill
>          Issue Type: Bug
>          Components:  Server, Storage - JSON
>    Affects Versions: 1.10.0
>         Environment: *jdk_8u45_x64*
> *single drillbit running on zookeeper*
> *Following options set to TRUE:*
> drill.exec.functions.cast_empty_string_to_null
> store.json.all_text_mode
> store.parquet.enable_dictionary_encoding
> store.parquet.use_new_reader
>            Reporter: David Lee
>            Assignee: Jinfeng Ni
>             Fix For: 1.10.0, 1.11.0, 1.12.0
>
>         Attachments: 001.json, 100.json, 111.json
>
>
> *Running the following SQL on these three JSON files fail: *
> 001.json 100.json 111.json
> select t.id
> from dfs.`/tmp/???.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> *Error:*
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException:
index: 1024, length: 1 (expected: range(0, 1024)) Fragment 0:0 [Error Id: xxxx.xxxx...
> *However running the same SQL on two out of three files works:*
> select t.id
> from dfs.`/tmp/1??.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> select t.id
> from dfs.`/tmp/?1?.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> select t.id
> from dfs.`/tmp/??1.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'
> *Changing the selected column from t.id to t.* also works: *
> select *
> from dfs.`/tmp/???.json` t
> where t.assetData.debt.couponPaymentFeature.interestBasis = '5'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message