Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4C1BF200C3A for ; Thu, 16 Mar 2017 20:35:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4ADAE160B72; Thu, 16 Mar 2017 19:35:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 93CDA160B78 for ; Thu, 16 Mar 2017 20:35:46 +0100 (CET) Received: (qmail 47865 invoked by uid 500); 16 Mar 2017 19:35:45 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 47137 invoked by uid 500); 16 Mar 2017 19:35:45 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 47093 invoked by uid 99); 16 Mar 2017 19:35:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Mar 2017 19:35:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 526CE1A03FA for ; Thu, 16 Mar 2017 19:35:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zmf_c9Fe6o7e for ; Thu, 16 Mar 2017 19:35:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 034D760DFC for ; Thu, 16 Mar 2017 19:35:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 562E1E095B for ; Thu, 16 Mar 2017 19:35:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 988DD254BB for ; Thu, 16 Mar 2017 19:35:41 +0000 (UTC) Date: Thu, 16 Mar 2017 19:35:41 +0000 (UTC) From: "Hassan Eslami (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (GIRAPH-1137) Remove channel probing from Netty worker thread for credit-based flow-control MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 16 Mar 2017 19:35:47 -0000 Hassan Eslami created GIRAPH-1137: ------------------------------------- Summary: Remove channel probing from Netty worker thread for credit-based flow-control Key: GIRAPH-1137 URL: https://issues.apache.org/jira/browse/GIRAPH-1137 Project: Giraph Issue Type: Bug Reporter: Hassan Eslami Assignee: Hassan Eslami In credit-based flow-control, sometimes, client threads (one type of Netty worker threads used in Giraph) try to send requests to other workers. This is bad practice for Netty and can cause Netty to mark the execution as deadlock-prone (an example exception shown below). Client threads should only be responsible for sending ACK/NACK messages in response to requests, and they should do so by reuseing the channel from which they received the request. In the current implementation, client threads may try to send unsent/cached requests in credit-based flow control. Sending such requests should be delegated to other threads. WARN 2017-03-08 06:06:22,104 [netty-client-worker-3] .... io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise@2c455378(incomplete) at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:383) at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:157) at io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:343) at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:259) at org.apache.giraph.utils.ProgressableUtils$ChannelFutureWaitable.waitFor(ProgressableUtils.java:461) at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:214) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:180) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:165) at org.apache.giraph.utils.ProgressableUtils.awaitChannelFuture(ProgressableUtils.java:132) at org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:715) at org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799) at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789) at org.apache.giraph.comm.flow_control.CreditBasedFlowControl.trySendCachedRequests(CreditBasedFlowControl.java:515) at org.apache.giraph.comm.flow_control.CreditBasedFlowControl.messageAckReceived(CreditBasedFlowControl.java:485) at org.apache.giraph.comm.netty.NettyClient.messageReceived(NettyClient.java:840) at org.apache.giraph.comm.netty.handler.ResponseClientHandler.channelRead(ResponseClientHandler.java:87) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) at org.apache.giraph.comm.netty.InboundByteCounter.channelRead(InboundByteCounter.java:89) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:126) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)