Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9CFF8116F2 for ; Thu, 7 Aug 2014 07:08:17 +0000 (UTC) Received: (qmail 8252 invoked by uid 500); 7 Aug 2014 07:08:16 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 8175 invoked by uid 500); 7 Aug 2014 07:08:16 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 8160 invoked by uid 99); 7 Aug 2014 07:08:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2014 07:08:16 +0000 Date: Thu, 7 Aug 2014 07:08:16 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-11695) PeriodicFlusher and WakeFrequency issues MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Lars Hofhansl created HBASE-11695: ------------------------------------- Summary: PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing "important" flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)