drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Reshetov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
Date Fri, 03 Apr 2015 20:35:52 GMT
Alexander Reshetov created DRILL-2677:
-----------------------------------------

             Summary: Query does not go beyond 4096 lines in small JSON files
                 Key: DRILL-2677
                 URL: https://issues.apache.org/jira/browse/DRILL-2677
             Project: Apache Drill
          Issue Type: Bug
         Environment: drill 0.8 official build
            Reporter: Alexander Reshetov


Hello,

I'm trying to execute next query:
{code}
select * from (select source.pck, source.`timestamp`, flatten(source.HostUpdateTypeNW.Transfers)
as entry from dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
{code}

And it works as expected and I got result:
{code}
+------------+------------+------------+
|    pck     | timestamp  |   entry    |
+------------+------------+------------+
| 3547       | 1419807470286356 | {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"888888888","PackageOrigSenderID":"8","TransferingID":"88888","TransitCN":"888","PackageChkPnt":"8888","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
|
+------------+------------+------------+
1 row selected (0.188 seconds)
{code}

This file contains 4095 same lines of one JSON string + at the end another JOSN line (see
attached file dataset_4095_and_1.json)

The problem is when first string repeats more than 4095 times query got exception. Here is
query for file with 4096 string of first type + 1 string of another (see attached file dataset_4096_and_1.json).

{code}
select * from (select source.pck, source.`timestamp`, flatten(source.HostUpdateTypeNW.Transfers)
as entry from dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" java.lang.RuntimeException:
Error closing fragment context.
	at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.drill.exec.vector.NullableIntVector cannot
be cast to org.apache.drill.exec.vector.RepeatedVector
	at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
	at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
	at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
	... 4 more
Query failed: RemoteRpcException: Failure while running fragment., org.apache.drill.exec.vector.NullableIntVector
cannot be cast to org.apache.drill.exec.vector.RepeatedVector [ cb6c7914-438f-440a-9c74-fe39130feca9
on testlab-broker:31010 ]
[ cb6c7914-438f-440a-9c74-fe39130feca9 on testlab-broker:31010 ]

Error: exception while executing query: Failure while executing query. (state=,code=0)
{code}

It means that Drill stops analyze schema exactly after 4096 lines and that's why my query
is failing.

And I assume that such behavior lead to another issue from which I investigated this one.
It could be shown on large files, perhaps Drill somehow split file into smaller chunks and
in one of them exists similar sequence of lines (4096 of the same type from Drill point of
view and it stops query which lead to another exception). Large file attached as dataset_sample.json.gz

Here is view (dataset_sample.view.drill) which I use for query of large file:
{code}
{
  "name" : "dataset_sample",
  "sql" : "SELECT `Message`.`timestamp`, `flatten`(`Message`.`HostUpdateTypeCR`['Transfers'])
AS `entries`\nFROM `dfs`.`/mnt/data/dataset_sample.json.gz` AS `Message`",
  "fields" : [ {
    "name" : "timestamp",
    "type" : "ANY"
  }, {
    "name" : "transfers",
    "type" : "ANY"
  } ],
  "workspaceSchemaPath" : [ "dfs", "mnt" ]
}
{code}

And here is query which I'm trying to execute:
{code}
0: jdbc:drill:zk=local> create table dataset_tbl as
. . . . . . . . . . . > select dataset_sample.transfers.TransferingID as id, dataset_sample.transfers.TransferingKind
as type from dataset_sample;
Query failed: Query stopped., index: 9502, length: 1 (expected: range(0, 1024)) [ c5eac3ee-0266-4645-b6b5-2a1b58df4821
on testlab-broker:31010 ]

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=local> Exception in thread "WorkManager-19" java.lang.IllegalStateException
	at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
	at org.apache.drill.common.DeferredException.addException(DeferredException.java:47)
	at org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:61)
	at org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:133)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:181)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
{code}

Please let me know if I should split this issue to two separate issues or if you need any
additional info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message