Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6DFBCB1B for ; Fri, 13 Sep 2013 12:33:27 +0000 (UTC) Received: (qmail 74949 invoked by uid 500); 13 Sep 2013 06:00:58 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 74907 invoked by uid 500); 13 Sep 2013 06:00:57 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 74807 invoked by uid 99); 13 Sep 2013 06:00:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Sep 2013 06:00:54 +0000 Date: Fri, 13 Sep 2013 06:00:54 +0000 (UTC) From: "Feng Honghua (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9467) write can be totally blocked temporarily by a write-heavy region MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766263#comment-13766263 ] Feng Honghua commented on HBASE-9467: ------------------------------------- Change and explanation of the patch: 1. Throw RegionOverloadedException immediately rather than wait/retry within HRegion when the target region is above the memstore limit, this avoid write requests on region above memstore limit occupying/saturating handler threads. This change is in HRegion.checkResources method. 2. Reuse the exception handling and retry mechanism of AsyncProcess in client to handle RegionOverloadedException thrown from RS. Since RegionOverloadedException is not a DoNotRetryIOException, it'll be handled the same way as other non-DoNotRetryIOException thrown from RS by AsyncProcess and the according request will be retried using incremental backoff. In a more general sense, we can view RegionOverloadedException as another kind of retriable exception and reuse all the current handling for it in AsyncProcess/client, so no change in client side code. And if we really want to use exponential backoff rather than incremental backoff for RegionOverloadedException, as Todd suggested, we can change the code in AsyncProcess accordingly. 3. We also need to check memstore limit and throw RegionOverloadedException for 'increment' and 'append' operations, since they also insert kv to memstore and increase its size. (checkResources is not called for these two operations in HRegion previously, corrected here) 4. In UT TestHFileArchiving, RegionOverloadedException is thrown during loadRegion and since the 'put' operations are called directly via HRegion, not via client/AsyncProcess, a similiar 'catch-and-wait' handling is added here to proceed without failure. [~nkeywal] / [~stack] / [~tlipcon] : Any feedback for the patch? Thanks in advance. > write can be totally blocked temporarily by a write-heavy region > ---------------------------------------------------------------- > > Key: HBASE-9467 > URL: https://issues.apache.org/jira/browse/HBASE-9467 > Project: HBase > Issue Type: Improvement > Reporter: Feng Honghua > Assignee: Feng Honghua > Attachments: HBASE-9467-trunk-v0.patch > > > Write to a region can be blocked temporarily if the memstore of that region reaches the threshold(hbase.hregion.memstore.block.multiplier * hbase.hregion.flush.size) until the memstore of that region is flushed. > For a write-heavy region, if its write requests saturates all the handler threads of that RS when write blocking for that region occurs, requests of other regions/tables to that RS also can't be served due to no available handler threads...until the pending writes of that write-heavy region are served after the flush is done. Hence during this time period, from the RS perspective it can't serve any request from any table/region just due to a single write-heavy region. > This sounds not very reasonable, right? Maybe write requests from a region can only be served by a sub-set of the handler threads, and then write blocking of any single region can't lead to the scenario mentioned above? > Comment? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira