Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E585E200CC2 for ; Wed, 21 Jun 2017 01:08:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E47E0160BF2; Tue, 20 Jun 2017 23:08:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0FA00160BEF for ; Wed, 21 Jun 2017 01:08:03 +0200 (CEST) Received: (qmail 22463 invoked by uid 500); 20 Jun 2017 23:08:03 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 22454 invoked by uid 99); 20 Jun 2017 23:08:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jun 2017 23:08:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CC50B1AA972 for ; Tue, 20 Jun 2017 23:08:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FJgRjoBp4Nlr for ; Tue, 20 Jun 2017 23:08:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D2C305FB96 for ; Tue, 20 Jun 2017 23:08:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 631C1E073C for ; Tue, 20 Jun 2017 23:08:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1C44F240B8 for ; Tue, 20 Jun 2017 23:08:00 +0000 (UTC) Date: Tue, 20 Jun 2017 23:08:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5599) Notify StatusHandlerListener that batch sending has failed even if channel is still open MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 20 Jun 2017 23:08:05 -0000 [ https://issues.apache.org/jira/browse/DRILL-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056634#comment-16056634 ] ASF GitHub Bot commented on DRILL-5599: --------------------------------------- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/857#discussion_r123118764 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -111,13 +111,16 @@ public RpcListener(RpcOutcomeListener handler, Class clazz, int coordinati @Override public void operationComplete(ChannelFuture future) throws Exception { - if (!future.isSuccess()) { - removeFromMap(coordinationId); - if (future.channel().isActive()) { - throw new RpcException("Future failed"); - } else { - setException(new ChannelClosedException()); + try { + removeFromMap(coordinationId); + } finally { + final Throwable cause = future.cause(); + if (future.channel().isActive()) { + setException(cause == null ? new RpcException("Future has failed") : cause); --- End diff -- overall, LGTM. Some minor comments. > Notify StatusHandlerListener that batch sending has failed even if channel is still open > ----------------------------------------------------------------------------------------- > > Key: DRILL-5599 > URL: https://issues.apache.org/jira/browse/DRILL-5599 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.11.0 > Reporter: Arina Ielchiieva > Assignee: Arina Ielchiieva > Attachments: sample.json > > > *Issue* > Queries stay in CANCELLATION_REQUESTED state after connection with client was interrupted. Jstack shows that threads for such queries are blocked and waiting to semaphore to be released. > {noformat} > "26b70318-ddde-9ead-eee2-0828da97b59f:frag:0:0" daemon prio=10 tid=0x00007f56dc3c9000 nid=0x25fd waiting on condition [0x00007f56b31dc000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006f4688ab0> (a java.util.concurrent.Semaphore$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) > at org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48) > - locked <0x00000006f4688a78> (a org.apache.drill.exec.ops.SendingAccountor) > at org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:486) > at org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:134) > at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.close(ScreenCreator.java:141) > at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:313) > at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155) > at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264) > at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: - <0x000000073f800b68> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > *Reproduce* > Ran modified ConcurrencyTest.java referenced in DRILL-4338 and cancel after 2-3 seconds. ConcurrencyTest.java should be modified as follows: > {{ExecutorService executor = Executors.newFixedThreadPool(10);}} and execute 200 queries {{for (int i = 1; i <= 200; i++)}}. > Query: {{select * from dfs.`sample.json`}}, data set is attached. > *Problem description* > Looks like the problem occurs when the server has sent data to the client and waiting from the client confirmation that data was received. In this case [{{ChannelListenerWithCoordinationId}}|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java#L118] is used for tracking. {{ChannelListenerWithCoordinationId}} contains {{StatusHandler}} which keeps track of sent batches. It updates {{SendingAccountor}} with information about how many batches were sent and how many batches have reached the client (successfully or not). > When sent operation is complete (successfully or not) {{operationComplete(ChannelFuture future)}} is called. Given future contains information if sent operation was successful or not, failure cause, channel status etc. If sent operation was successful we do nothing since in this case client sent us acknowledgment and when we received it, we notified {{StatusHandlerListener}} has batch was received. But if sent operation has failed, we need to notify {{StatusHandler}} was sent has unsuccessful. > {{operationComplete(ChannelFuture future)}} code: > {code} > if (!future.isSuccess()) { > removeFromMap(coordinationId); > if (future.channel().isActive()) { > throw new RpcException("Future failed"); > } else { > setException(new ChannelClosedException()); > } > } > } > {code} > Method {{setException}} notifies {{StatusHandler}} that batch sent has failed but it's only called when channel is closed. When channel is still open we just throw {{RpcException}}. This is where the problem occurs. {{operationComplete(ChannelFuture future)}} is called via Netty {{DefaultPromise.notifyListener0}} method which catches {{Throwable}} and just logs it. So even of we throw exception nobody is notified about it especially {{StatusHandler}}. > *Fix* > Use {{setException}} even if channel is still open instead of throwing exception. > This problem was also raised in [PR-463|https://github.com/apache/drill/pull/463] but was decided to be fixed in the scope of new Jira. -- This message was sent by Atlassian JIRA (v6.4.14#64029)