drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudheesh Katkam (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4057) Sporadic IllegalReferenceCountException in RPC layer causes query to hang
Date Tue, 10 Nov 2015 17:55:11 GMT
Sudheesh Katkam created DRILL-4057:
--------------------------------------

             Summary: Sporadic IllegalReferenceCountException in RPC layer causes query to
hang
                 Key: DRILL-4057
                 URL: https://issues.apache.org/jira/browse/DRILL-4057
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Sudheesh Katkam
            Priority: Critical
             Fix For: 1.3.0


In a 10 node cluster, running TPCH queries one at a time, sporadically there is a "IllegalReferenceCountException"
in the logs causing a query to hang.

In one of the logs (node 8):
{code}
2015-11-03 15:11:01,611 ucs-node8.perf.lab [DATA-rpc-event-queue] ERROR o.a.drill.exec.rpc.data.DataServer
- Failure while handling message.
io.netty.util.IllegalReferenceCountException: refCnt: 0
        at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1178) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.buffer.AbstractByteBuf.checkReadableBytes(AbstractByteBuf.java:1161) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.buffer.AbstractByteBuf.readBytes(AbstractByteBuf.java:676) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.buffer.ByteBufInputStream.read(ByteBufInputStream.java:120) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at java.io.InputStream.read(InputStream.java:101) ~[na:1.7.0_65]
        at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) ~[protobuf-java-2.5.0.jar:na]
        at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) ~[protobuf-java-2.5.0.jar:na]
        at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) ~[protobuf-java-2.5.0.jar:na]
        at org.apache.drill.exec.proto.BitData$FragmentRecordBatch.<init>(BitData.java:1152)
~[drill-protocol-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.exec.proto.BitData$FragmentRecordBatch.<init>(BitData.java:1116)
~[drill-protocol-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.exec.proto.BitData$FragmentRecordBatch$1.parsePartialFrom(BitData.java:1269)
~[drill-protocol-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.exec.proto.BitData$FragmentRecordBatch$1.parsePartialFrom(BitData.java:1264)
~[drill-protocol-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) ~[protobuf-java-2.5.0.jar:na]
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) ~[protobuf-java-2.5.0.jar:na]
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) ~[protobuf-java-2.5.0.jar:na]
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) ~[protobuf-java-2.5.0.jar:na]
        at org.apache.drill.exec.rpc.RpcBus.get(RpcBus.java:332) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:124) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:51) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:390) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
2015-11-03 15:11:01,612 ucs-node8.perf.lab [29c6c10e-642f-c605-cd73-f466ddd986d4:frag:1:71]
INFO  o.a.d.e.w.fragment.FragmentExecutor - 29c6c10e-642f-c605-cd73-f466ddd986d4:1:71: State
change requested RUNNING --> FINISHED
2015-11-03 15:11:01,612 ucs-node8.perf.lab [29c6c10e-642f-c605-cd73-f466ddd986d4:frag:1:71]
INFO  o.a.d.e.w.f.FragmentStatusReporter - 29c6c10e-642f-c605-cd73-f466ddd986d4:1:71: State
to report: FINISHED
2015-11-03 15:11:01,612 ucs-node8.perf.lab [DATA-rpc-event-queue] ERROR o.a.drill.exec.rpc.data.DataServer
- Failure while running rpc command.
java.lang.IllegalArgumentException: Buffer has negative reference count.
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) ~[guava-14.0.1.jar:na]
        at io.netty.buffer.DrillBuf.release(DrillBuf.java:250) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:4.0.27.Final]
        at io.netty.buffer.DrillBuf.release(DrillBuf.java:259) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:4.0.27.Final]
        at io.netty.buffer.DrillBuf.release(DrillBuf.java:239) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:4.0.27.Final]
        at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:397) ~[drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
~[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
{code}

On one of the other nodes (node 9), there is a thread that is waiting:
{code}
"29c6c10e-642f-c605-cd73-f466ddd986d4:frag:1:216" daemon prio=10 tid=0x00007fc2d151f800 nid=0x5edf
waiting on condition [0x00007fc273f36000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000715db0ba8> (a java.util.concurrent.Semaphore$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
        at org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
        - locked <0x0000000715db0b70> (a org.apache.drill.exec.ops.SendingAccountor)
        at org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:441)
        at org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:133)
        at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:339)
        at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:179)
        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message