Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 487E3200B91 for ; Thu, 15 Sep 2016 00:36:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4725E160ADB; Wed, 14 Sep 2016 22:36:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 90FA2160AB4 for ; Thu, 15 Sep 2016 00:36:22 +0200 (CEST) Received: (qmail 13216 invoked by uid 500); 14 Sep 2016 22:36:21 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 13154 invoked by uid 99); 14 Sep 2016 22:36:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Sep 2016 22:36:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8862A2C0D56 for ; Wed, 14 Sep 2016 22:36:21 +0000 (UTC) Date: Wed, 14 Sep 2016 22:36:21 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16388) Prevent client threads being blocked by only one slow region server MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 14 Sep 2016 22:36:23 -0000 [ https://issues.apache.org/jira/browse/HBASE-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491629#comment-15491629 ] Hudson commented on HBASE-16388: -------------------------------- FAILURE: Integrated in Jenkins build HBase-1.4 #416 (See [https://builds.apache.org/job/HBase-1.4/416/]) HBASE-16388 Prevent client threads being blocked by only one slow region (stack: rev 069d1f73fa7e1a2b4c21ba95dea867d077a51068) * (add) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ServerTooBusyException.java * (edit) hbase-common/src/main/resources/hbase-default.xml * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java * (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AbstractRpcClient.java > Prevent client threads being blocked by only one slow region server > ------------------------------------------------------------------- > > Key: HBASE-16388 > URL: https://issues.apache.org/jira/browse/HBASE-16388 > Project: HBase > Issue Type: New Feature > Reporter: Phil Yang > Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16388-branch-1-v1.patch, HBASE-16388-branch-1-v2.patch, HBASE-16388-v1.patch, HBASE-16388-v2.patch, HBASE-16388-v2.patch, HBASE-16388-v2.patch, HBASE-16388-v2.patch, HBASE-16388-v3.patch > > > It is a general use case for HBase's users that they have several threads/handlers in their service, and each handler has its own Table/HTable instance. Generally users think each handler is independent and won't interact each other. > However, in an extreme case, if a region server is very slow, every requests to this RS will timeout, handlers of users' service may be occupied by the long-waiting requests even requests belong to other RS will also be timeout. > For example: > If we have 100 handlers in a client service(timeout is 1000ms) and HBase has 10 region servers whose average response time is 50ms. If no region server is slow, we can handle 2000 requests per second. > Now this service's QPS is 1000. If there is one region server very slow and all requests to it will be timeout. Users hope that only 10% requests failed, and 90% requests' response time is still 50ms, because only 10% requests are located to the slow RS. However, each second we have 100 long-waiting requests which exactly occupies all 100 handles. So all handlers is blocked, the availability of this service is almost zero. > To prevent this case, we can limit the max concurrent requests to one RS in process-level. Requests exceeding the limit will throws ServerBusyException(extends DoNotRetryIOE) immediately to users. In the above case, if we set this limit to 20, only 20 handlers will be occupied and other 80 handlers can still handle requests to other RS. The availability of this service is 90% as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)