Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 23002200C0D for ; Tue, 17 Jan 2017 05:11:33 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 21972160B4D; Tue, 17 Jan 2017 04:11:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 47F10160B41 for ; Tue, 17 Jan 2017 05:11:32 +0100 (CET) Received: (qmail 39267 invoked by uid 500); 17 Jan 2017 04:11:31 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 39254 invoked by uid 99); 17 Jan 2017 04:11:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jan 2017 04:11:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 88457C03C7 for ; Tue, 17 Jan 2017 04:11:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id C6Co8e6AfArA for ; Tue, 17 Jan 2017 04:11:29 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 43F155FB5B for ; Tue, 17 Jan 2017 04:11:29 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 56AE5E865F for ; Tue, 17 Jan 2017 04:11:28 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id AE13025289 for ; Tue, 17 Jan 2017 04:11:26 +0000 (UTC) Date: Tue, 17 Jan 2017 04:11:26 +0000 (UTC) From: "Duo Zhang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17475) Stack overflow in AsyncProcess if retry too much MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 17 Jan 2017 04:11:33 -0000 [ https://issues.apache.org/jira/browse/HBASE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824930#comment-15824930 ] Duo Zhang commented on HBASE-17475: ----------------------------------- This is means we will always resubmit using the thread pool if the numAttempt is greater than HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER? Yeah it works but I think there are better solutions. See the code of netty https://github.com/netty/netty/blob/b2379e62f4ae50b55bce7c43e5bd008cf9aacfd1/transport/src/main/java/io/netty/channel/local/LocalChannel.java#L310 Where they use a thread local variable to record the depth of stack, if it reaches the limit then submit using a thread pool, otherwise just call the runnable directly. Thanks. > Stack overflow in AsyncProcess if retry too much > ------------------------------------------------ > > Key: HBASE-17475 > URL: https://issues.apache.org/jira/browse/HBASE-17475 > Project: HBase > Issue Type: Bug > Components: API > Affects Versions: 2.0.0, 1.4.0 > Reporter: Allan Yang > Assignee: Allan Yang > Attachments: HBASE-17475-branch-1.patch, HBASE-17475.patch > > > In AsyncProcess, we resubmit the retry task in the same thread > {code} > // run all the runnables > for (Runnable runnable : runnables) { > if ((--actionsRemaining == 0) && reuseThread) { > runnable.run(); > } else { > try { > pool.submit(runnable); > } > ...... > {code} > But, if we retry too much time. soon, stack overflow will occur. This is very common in clusters with Phoenix. Phoenix need to write index table in the normal write path, retry will cause stack overflow exception. > {noformat} > "htable-pool19-t2" #582 daemon prio=5 os_prio=0 tid=0x0000000002687800 nid=0x4a96 waiting on condition [0x00007fe3f6301000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1174) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1321) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:575) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:729) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:977) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:886) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1181) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1321) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:575) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:729) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:977) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:886) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1181) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1321) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:575) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:729) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:977) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:886) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1181) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1321) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:575) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:729) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:977) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:886) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1181) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1321) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:575) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:729) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:977) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:886) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1181) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1321) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:575) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:729) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:977) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:886) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1181) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1321) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:575) > at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:729) > > Too long to present... > {noformat} > We need to control the {{reuseThread}}, if number of retry is too much, then don't reuse the same thread to prevent stack overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)