drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Pernsteiner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-1239) java.lang.AssertionError When performing select against nested JSON > 60,000 records
Date Fri, 01 Aug 2014 18:57:39 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andy Pernsteiner updated DRILL-1239:
------------------------------------

    Description: 
Using a JSON file with contents like so for each record:
{quote}:

{"trans_id":999999,"date":"11/03/2012","time":"09:07:05","user_info":{"cust_id":2,"device":"AOS4.3","state":"tx"},"marketing_info":{"camp_id":14,"keywords":["it","i","wants","yes","things","few","like"]},"trans_info":{"prod_id":[167,145,5,487,290],"purch_flag":"false"}}

{quote}
First I set the following to get more verbose output:

{quote}
0: jdbc:drill:> alter session set `exec.errors.verbose`=true;
{quote}

Then performed a simple select via sqlline:

{quote}
select * from dfs.`/mapr/drillram/JSON/large/mobile.json`;
<50,000+ rows of output>
| 56184      | 03/11/2013 | 14:19:10   | {"cust_id":4,"device":"IOS5","state":"va"} | {"camp_id":15,"keywords":["young"]}
| {"prod_id |
| 56185      | 07/03/2013 | 14:30:38   | {"cust_id":1518,"device":"AOS4.4","state":"wi"} |
{"camp_id":11,"keywords":["so","way","okay |
| 56186      | 07/07/2013 | 10:41:04   | {"cust_id":97279,"device":"IOS5","state":"ga"} |
{"camp_id":7,"keywords":[]} | {"prod_id":[9 |
Query failed: Failure while running fragment. null [4407eef7-06aa-4cf9-9962-a2f187ce8f17]
Node details: ip-172-16-1-111:31011/31012
java.lang.AssertionError
	at org.apache.drill.exec.vector.complex.WriteState.fail(WriteState.java:37)
	at org.apache.drill.exec.vector.complex.impl.AbstractBaseWriter.inform(AbstractBaseWriter.java:62)
	at org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.inform(RepeatedBigIntWriterImpl.java:108)
	at org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.setPosition(RepeatedBigIntWriterImpl.java:130)
	at org.apache.drill.exec.vector.complex.impl.SingleListWriter.setPosition(SingleListWriter.java:700)
	at org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153)
	at org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153)
	at org.apache.drill.exec.vector.complex.impl.VectorContainerWriter.setPosition(VectorContainerWriter.java:66)
	at org.apache.drill.exec.store.easy.json.JSONRecordReader2.next(JSONRecordReader2.java:80)
	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:148)
	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:116)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:59)
	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:98)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:49)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116)
	at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:250)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


java.lang.RuntimeException: java.sql.SQLException: Failure while trying to get next result
batch.
	at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
	at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
	at sqlline.SqlLine.print(SqlLine.java:1809)
	at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
	at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
	at sqlline.SqlLine.dispatch(SqlLine.java:889)
	at sqlline.SqlLine.begin(SqlLine.java:763)
	at sqlline.SqlLine.start(SqlLine.java:498)
	at sqlline.SqlLine.main(SqlLine.java:460)

{quote}
If I re-run the same query against a smaller version of the same dataset (<50,000 records),
I don't have the issue.  So far I've tried modifying the  DRILL_MAX_DIRECT_MEMORY and
DRILL_MAX_HEAP variables to see if I could find something that works, but neither seem to
make a difference.  Note: the error appears the same if I run on standalone mode.

  was:
Using a JSON file with contents like so for each record:

{"trans_id":999999,"date":"11/03/2012","time":"09:07:05","user_info":{"cust_id":2,"device":"AOS4.3","state":"tx"},"marketing_info":{"camp_id":14,"keywords":["it","i","wants","yes","things","few","like"]},"trans_info":{"prod_id":[167,145,5,487,290],"purch_flag":"false"}}

First I set the following to get more verbose output:

0: jdbc:drill:> alter session set `exec.errors.verbose`=true;


Then performed a simple select via sqlline:

select * from dfs.`/mapr/drillram/JSON/large/mobile.json`;
<50,000+ rows of output>
| 56184      | 03/11/2013 | 14:19:10   | {"cust_id":4,"device":"IOS5","state":"va"} | {"camp_id":15,"keywords":["young"]}
| {"prod_id |
| 56185      | 07/03/2013 | 14:30:38   | {"cust_id":1518,"device":"AOS4.4","state":"wi"} |
{"camp_id":11,"keywords":["so","way","okay |
| 56186      | 07/07/2013 | 10:41:04   | {"cust_id":97279,"device":"IOS5","state":"ga"} |
{"camp_id":7,"keywords":[]} | {"prod_id":[9 |
Query failed: Failure while running fragment. null [4407eef7-06aa-4cf9-9962-a2f187ce8f17]
Node details: ip-172-16-1-111:31011/31012
java.lang.AssertionError
	at org.apache.drill.exec.vector.complex.WriteState.fail(WriteState.java:37)
	at org.apache.drill.exec.vector.complex.impl.AbstractBaseWriter.inform(AbstractBaseWriter.java:62)
	at org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.inform(RepeatedBigIntWriterImpl.java:108)
	at org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.setPosition(RepeatedBigIntWriterImpl.java:130)
	at org.apache.drill.exec.vector.complex.impl.SingleListWriter.setPosition(SingleListWriter.java:700)
	at org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153)
	at org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153)
	at org.apache.drill.exec.vector.complex.impl.VectorContainerWriter.setPosition(VectorContainerWriter.java:66)
	at org.apache.drill.exec.store.easy.json.JSONRecordReader2.next(JSONRecordReader2.java:80)
	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:148)
	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:116)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:59)
	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:98)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:49)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116)
	at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:250)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


java.lang.RuntimeException: java.sql.SQLException: Failure while trying to get next result
batch.
	at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
	at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
	at sqlline.SqlLine.print(SqlLine.java:1809)
	at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
	at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
	at sqlline.SqlLine.dispatch(SqlLine.java:889)
	at sqlline.SqlLine.begin(SqlLine.java:763)
	at sqlline.SqlLine.start(SqlLine.java:498)
	at sqlline.SqlLine.main(SqlLine.java:460)

If I re-run the same query against a smaller version of the same dataset (<50,000 records),
I don't have the issue.  So far I've tried modifying the  DRILL_MAX_DIRECT_MEMORY and
DRILL_MAX_HEAP variables to see if I could find something that works, but neither seem to
make a difference.  Note: the error appears the same if I run on standalone mode.


> java.lang.AssertionError When performing select against nested JSON > 60,000 records
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-1239
>                 URL: https://issues.apache.org/jira/browse/DRILL-1239
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 0.4.0
>         Environment: Seen both on standalone (OSX host, 16 GB RAM) as well as on cluster
in AWS:
> 5 nodes, centos-6.5, 64GB RAM, 2 SSD's/node for mfs/dfs.  Running MapR 3.1.1.
>            Reporter: Andy Pernsteiner
>
> Using a JSON file with contents like so for each record:
> {quote}:
> {"trans_id":999999,"date":"11/03/2012","time":"09:07:05","user_info":{"cust_id":2,"device":"AOS4.3","state":"tx"},"marketing_info":{"camp_id":14,"keywords":["it","i","wants","yes","things","few","like"]},"trans_info":{"prod_id":[167,145,5,487,290],"purch_flag":"false"}}
> {quote}
> First I set the following to get more verbose output:
> {quote}
> 0: jdbc:drill:> alter session set `exec.errors.verbose`=true;
> {quote}
> Then performed a simple select via sqlline:
> {quote}
> select * from dfs.`/mapr/drillram/JSON/large/mobile.json`;
> <50,000+ rows of output>
> | 56184      | 03/11/2013 | 14:19:10   | {"cust_id":4,"device":"IOS5","state":"va"} |
{"camp_id":15,"keywords":["young"]} | {"prod_id |
> | 56185      | 07/03/2013 | 14:30:38   | {"cust_id":1518,"device":"AOS4.4","state":"wi"}
| {"camp_id":11,"keywords":["so","way","okay |
> | 56186      | 07/07/2013 | 10:41:04   | {"cust_id":97279,"device":"IOS5","state":"ga"}
| {"camp_id":7,"keywords":[]} | {"prod_id":[9 |
> Query failed: Failure while running fragment. null [4407eef7-06aa-4cf9-9962-a2f187ce8f17]
> Node details: ip-172-16-1-111:31011/31012
> java.lang.AssertionError
> 	at org.apache.drill.exec.vector.complex.WriteState.fail(WriteState.java:37)
> 	at org.apache.drill.exec.vector.complex.impl.AbstractBaseWriter.inform(AbstractBaseWriter.java:62)
> 	at org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.inform(RepeatedBigIntWriterImpl.java:108)
> 	at org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.setPosition(RepeatedBigIntWriterImpl.java:130)
> 	at org.apache.drill.exec.vector.complex.impl.SingleListWriter.setPosition(SingleListWriter.java:700)
> 	at org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153)
> 	at org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153)
> 	at org.apache.drill.exec.vector.complex.impl.VectorContainerWriter.setPosition(VectorContainerWriter.java:66)
> 	at org.apache.drill.exec.store.easy.json.JSONRecordReader2.next(JSONRecordReader2.java:80)
> 	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:148)
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:116)
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:59)
> 	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:98)
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:49)
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116)
> 	at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:250)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> java.lang.RuntimeException: java.sql.SQLException: Failure while trying to get next result
batch.
> 	at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
> 	at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
> 	at sqlline.SqlLine.print(SqlLine.java:1809)
> 	at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
> 	at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
> 	at sqlline.SqlLine.dispatch(SqlLine.java:889)
> 	at sqlline.SqlLine.begin(SqlLine.java:763)
> 	at sqlline.SqlLine.start(SqlLine.java:498)
> 	at sqlline.SqlLine.main(SqlLine.java:460)
> {quote}
> If I re-run the same query against a smaller version of the same dataset (<50,000
records), I don't have the issue.  So far I've tried modifying the  DRILL_MAX_DIRECT_MEMORY
and
> DRILL_MAX_HEAP variables to see if I could find something that works, but neither seem
to make a difference.  Note: the error appears the same if I run on standalone mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message