drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kathiravelu Pradeeban (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4855) Querying MongoDB collection with nested data fails
Date Thu, 18 Aug 2016 20:41:22 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427107#comment-15427107
] 

Kathiravelu Pradeeban commented on DRILL-4855:
----------------------------------------------

If the involved query is an integer, the error message eliminates unnecessary/irrelevant logs.

For example, consider the below JSON:
{"_id":{"$oid":"56a784b76952647b7b51c562"},"provenance":{"image":{"case_id":100,"subject_id":"TCGA"}}}

Now we have another Mongo collection by importing this:
mongoimport --db users --collection contacts2 --type small.json

Let's query:
SELECT camic.provenance.image.case_id caseid
FROM mongo.users.`contacts2` camic
WHERE caseid = 100;

Nothing returned.
tail -f sqlline.log shows the below:

2016-08-18 16:36:42,718 [2849e3a5-1e59-6b91-ae82-9d199b961cca:foreman] INFO  o.a.drill.exec.work.foreman.Foreman
- Query text for query id 2849e3a5-1e59-6b91-ae82-9d199b961cca: SELECT camic.provenance.image.case_id
caseid
FROM mongo.users.`contacts2` camic
WHERE caseid = 100
2016-08-18 16:36:43,793 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] INFO  o.a.d.e.s.m.MongoScanBatchCreator
- Number of record readers initialized : 1
2016-08-18 16:36:43,793 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 2849e3a5-1e59-6b91-ae82-9d199b961cca:0:0: State change requested AWAITING_ALLOCATION -->
RUNNING
2016-08-18 16:36:43,793 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 2849e3a5-1e59-6b91-ae82-9d199b961cca:0:0: State to report: RUNNING
2016-08-18 16:36:43,794 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] INFO  o.a.d.e.s.mongo.MongoRecordReader
- Filters Applied : Document{{}}
2016-08-18 16:36:43,794 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] INFO  o.a.d.e.s.mongo.MongoRecordReader
- Fields Selected :Document{{_id=0, caseid=1, provenance=1}}
2016-08-18 16:36:43,795 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] WARN  o.a.d.e.e.ExpressionTreeMaterializer
- Unable to find value vector of path `caseid`, returning null instance.
2016-08-18 16:36:43,799 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 2849e3a5-1e59-6b91-ae82-9d199b961cca:0:0: State change requested RUNNING --> FINISHED
2016-08-18 16:36:43,799 [2849e3a5-1e59-6b91-ae82-9d199b961cca:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 2849e3a5-1e59-6b91-ae82-9d199b961cca:0:0: State to report: FINISHED



Please note, as before, only the WHERE clause is causing the issue.

0: jdbc:drill:zk=local> SELECT camic.provenance.image.case_id caseid
. . . . . . . . . . . > FROM mongo.users.`contacts2` camic;
+---------+
| caseid  |
+---------+
| 100     |
+---------+


> Querying MongoDB collection with nested data fails
> --------------------------------------------------
>
>                 Key: DRILL-4855
>                 URL: https://issues.apache.org/jira/browse/DRILL-4855
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - MongoDB
>    Affects Versions: 1.6.0, 1.7.0
>         Environment: Centos7 and Ubuntu 16.04
>            Reporter: Kathiravelu Pradeeban
>            Priority: Critical
>
> To reproduce:
> 1. Create a json file called small.json with the below line:
> {"_id":{"$oid":"56a784b76952647b7b51c562"},"provenance":{"image":{"case_id":"TCGA-TS2","subject_id":"TCGA"}}}
> 2. Create a Mongo DB with the small.json as below:
> mongoimport --db users --collection contacts --type small.json
> 3. Create a Mongo Query involving the nested data to confirm everything is fine.
> use users;
> db.contacts.find({ "provenance.image.case_id": "TCGA-TS2"});
> This returns:
> { "_id" : ObjectId("56a784b76952647b7b51c562"), "provenance" : { "image" : { "case_id"
: "TCGA-TS2", "subject_id" : "TCGA" } } }
> 4. Create a Drill query for the same:
> SELECT camic.provenance.image.case_id caseid
> FROM mongo.users.`contacts` camic
> WHERE caseid = 'TCGA-TS2';
> The above query fails with the below error message.
> Error: SYSTEM ERROR: NumberFormatException: TCGA-TS2
> Fragment 0:0
> [Error Id: 142d9f37-fe13-4757-8009-e713d55bc1d8 on llovizna:31010] (state=,code=0)
> "tail -f sqlline.log" indicates the below:
> 2016-08-18 16:33:32,097 [2849e462-ade5-621f-5e4b-59e93c07ff11:foreman] INFO  o.a.drill.exec.work.foreman.Foreman
- Query text for query id 2849e462-ade5-621f-5e4b-59e93c07ff11: SELECT camic.provenance.image.case_id
caseid
> FROM mongo.users.`contacts` camic
> WHERE caseid = 'TCGA-TS2'
> 2016-08-18 16:33:33,369 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] INFO  o.a.d.e.s.m.MongoScanBatchCreator
- Number of record readers initialized : 1
> 2016-08-18 16:33:33,371 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 2849e462-ade5-621f-5e4b-59e93c07ff11:0:0: State change requested AWAITING_ALLOCATION -->
RUNNING
> 2016-08-18 16:33:33,371 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 2849e462-ade5-621f-5e4b-59e93c07ff11:0:0: State to report: RUNNING
> 2016-08-18 16:33:33,371 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] INFO  o.a.d.e.s.mongo.MongoRecordReader
- Filters Applied : Document{{}}
> 2016-08-18 16:33:33,371 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] INFO  o.a.d.e.s.mongo.MongoRecordReader
- Fields Selected :Document{{_id=0, caseid=1, provenance=1}}
> 2016-08-18 16:33:33,372 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] WARN  o.a.d.e.e.ExpressionTreeMaterializer
- Unable to find value vector of path `caseid`, returning null instance.
> 2016-08-18 16:33:33,375 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 2849e462-ade5-621f-5e4b-59e93c07ff11:0:0: State change requested RUNNING --> FAILED
> 2016-08-18 16:33:33,375 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 2849e462-ade5-621f-5e4b-59e93c07ff11:0:0: State change requested FAILED --> FINISHED
> 2016-08-18 16:33:33,376 [2849e462-ade5-621f-5e4b-59e93c07ff11:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor
- SYSTEM ERROR: NumberFormatException: TCGA-TS2
> Fragment 0:0
> [Error Id: efb49b38-0515-4b20-9d24-052944f04a73 on llovizna:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: NumberFormatException:
TCGA-TS2
> Fragment 0:0
> [Error Id: efb49b38-0515-4b20-9d24-052944f04a73 on llovizna:31010]
> 	at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
~[drill-common-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.6.0.jar:1.6.0]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0]
> 	at java.lang.Thread.run(Thread.java:744) [na:1.8.0]
> Caused by: java.lang.NumberFormatException: TCGA-TS2
> 	at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:95)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt(StringFunctionHelpers.java:120)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.test.generated.FiltererGen20.doSetup(FilterTemplate2.java:45)
~[na:na]
> 	at org.apache.drill.exec.test.generated.FiltererGen20.setup(FilterTemplate2.java:54)
~[na:na]
> 	at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:197)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema(FilterRecordBatch.java:109)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) ~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251)
~[drill-java-exec-1.6.0.jar:1.6.0]
> 	at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0]
> 	at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0]
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
~[hadoop-common-2.7.1.jar:na]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:251)
[drill-java-exec-1.6.0.jar:1.6.0]
> 	... 4 common frames omitted
> 2016-08-18 16:33:33,426 [CONTROL-rpc-event-queue] WARN  o.a.drill.exec.work.foreman.Foreman
- Dropping request to move to COMPLETED state as query is already at FAILED state (which is
terminal).
> 2016-08-18 16:33:33,426 [CONTROL-rpc-event-queue] WARN  o.a.d.e.w.b.ControlMessageHandler
- Dropping request to cancel fragment. 2849e462-ade5-621f-5e4b-59e93c07ff11:0:0 does not exist.
> 2016-08-18 16:33:33,427 [USER-rpc-event-queue] INFO  o.a.d.j.i.DrillResultSetImpl$ResultsListener
- [#14] Query failed: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException:
TCGA-TS2
> Fragment 0:0
> [Error Id: efb49b38-0515-4b20-9d24-052944f04a73 on llovizna:31010]
> 	at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
[drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) [drill-java-exec-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
[drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
[drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67) [drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374) [drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
[drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252) [drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) [drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285) [drill-rpc-1.6.0.jar:1.6.0]
> 	at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257) [drill-rpc-1.6.0.jar:1.6.0]
> 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
[netty-handler-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> 	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> 	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> 	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
> 	at java.lang.Thread.run(Thread.java:744) [na:1.8.0]
> 5. Please note the below returns the correct output:
> SELECT camic.provenance.image.case_id caseid
> FROM mongo.users.`contacts` camic;
> +-----------+
> |  caseid   |
> +-----------+
> | TCGA-TS2  |
> +-----------+
> 1 row selected (1,135 seconds)
> So the issue is with the WHERE clause:
> WHERE caseid = 'TCGA-TS2"';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message