Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EE5FC200C8C for ; Tue, 6 Jun 2017 16:28:59 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ED54E160BC6; Tue, 6 Jun 2017 14:28:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 18D02160BC3 for ; Tue, 6 Jun 2017 16:28:58 +0200 (CEST) Received: (qmail 21068 invoked by uid 500); 6 Jun 2017 14:28:57 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 21056 invoked by uid 99); 6 Jun 2017 14:28:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jun 2017 14:28:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0F5281AFDA9 for ; Tue, 6 Jun 2017 14:28:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.401 X-Spam-Level: X-Spam-Status: No, score=-2.401 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 0VpQsVAOcc72 for ; Tue, 6 Jun 2017 14:28:55 +0000 (UTC) Received: from mail-yb0-f176.google.com (mail-yb0-f176.google.com [209.85.213.176]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 7B3E75F5B7 for ; Tue, 6 Jun 2017 14:28:55 +0000 (UTC) Received: by mail-yb0-f176.google.com with SMTP id 202so41777193ybd.0 for ; Tue, 06 Jun 2017 07:28:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=uE8VDMzLlRvN4yfihytymHzoFPJ5thb4cAA8M4I+Rvg=; b=J6J5CiOd+1/punK1YYwbe5C5lPSHlE9aMxbSE6kQuizU8DxnkJ+ZlcwUCiNTEc83Gh HhGbTAIGjMaml7452uWhu0bR62Q36cbp9H8zQIvKxQ0qfGqFNwAOcPj/4Hhz76h8/+MO pzjyeZlKSvMtFX6cG5MYubw9N+t6l6p9Hj4B9pUAMt8e19/LI6qDBpm14A1o8mrv84Rf FQcEfSZSFvwi2U1r74wzQGOiHDig9SS9GkTa6sda+xwDRSvkXKpsYjaAzoW036zkp+EW I2oOVH2/E2ZLzAqwN+gVINkUkTGD09/9XIedcBQsxqn7BE6Me86of0d7rwr7KLL870Eq 0sqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=uE8VDMzLlRvN4yfihytymHzoFPJ5thb4cAA8M4I+Rvg=; b=Gv1/bR/ZKYI3ZR2azUbzEgdKY3QSo841mKLN6C4B3pG4eqS5hUDFBifoxubAcQcE0M w67r+kG2s2rtEz65Rzo/GP5RqNHKyg0xhnJDvz2Wnk1zIb8kcVhCjOTzMcDVRXjw1z6h QkAoa+I2noJHhgmqRjM0KjipYASs86FBfebS2mLB69GYWd1BLxCABJA5XJMlNunYyFjx 3FRGOK/5S4smED+VtI1n3uMQm6aJDcUu3WaxaVWCajmacFPZ/kF/8d5M4a1BMezpf/fa /NtXsKfqx/QSPshJoE0Tl8GkWNqjsUhnycBr9MUk+1NIQ4Nc4KJvTxehZeuCsCSFsXbN caIA== X-Gm-Message-State: AODbwcADpOOSv3sINEYAcyFumcgxUKfwUl1kTladMNSMXHz4x49cJgYp CKKNW//eScn/v7uOwRRjK804yg35YCj3 X-Received: by 10.37.86.130 with SMTP id k124mr2837390ybb.54.1496759334595; Tue, 06 Jun 2017 07:28:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.198.5 with HTTP; Tue, 6 Jun 2017 07:28:54 -0700 (PDT) In-Reply-To: References: From: Anoop John Date: Tue, 6 Jun 2017 19:58:54 +0530 Message-ID: Subject: Re: Any Repercussions of using Multiwal To: "user@hbase.apache.org" Cc: Raghavendra Pandey Content-Type: text/plain; charset="UTF-8" archived-at: Tue, 06 Jun 2017 14:29:00 -0000 You can config this max WALs. (As said by Yu , hbase.regionserver.maxlogs) When the total un archived WAL files count exceeds this, we will do force flushes and so release some of the WALs. As Yu mentioned, when we use multi WAL and say we have 2 WAL groups, this WAL count effectively will be 32 * 2 = 64. But u can config it to a lower value than the def 32. -Anoop- On Tue, Jun 6, 2017 at 6:12 PM, Sachin Jain wrote: > Thanks Yu! > It was certainly helpful. > >> Regarding the issue you met, what's the setting of > hbase.regionserver.maxlogs in your env? By default it's 32 which means for > each RS the un-archived wal number shouldn't exceed 32. However, when > multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals > allowed for a single RS. > > I used default configuration for this. By multiWal, I understand there is > different wal per region. Can you please explain how did you get 64 wals > for a Region Server. > >> when multiwal enabled, it allows 32 logs for each group, thus becoming 64 > wals allowed for a single RS. > > I thought one of the side effects of having multiwal enabled is that there > will be *large amount of data waiting in unarchived wals.* > So if a region server fails, it would take more time to playback the wal > files and hence it could *compromise Availability.* > > Wdyt ? > > Thanks > -Sachin > > > On Tue, Jun 6, 2017 at 2:04 PM, Yu Li wrote: > >> Hi Sachin, >> >> We have been using multiwal in production here in Alibaba for over 2 years >> and see no problem. Facebook is also running multiwal online. Please refer >> to HBASE-14457 for >> more >> details. >> >> There's also a JIRA HBASE-15131 >> proposing to turn on >> multiwal by default but still under discussion, please feel free to leave >> your voice there. >> >> Regarding the issue you met, what's the setting of >> hbase.regionserver.maxlogs in your env? By default it's 32 which means for >> each RS the un-archived wal number shouldn't exceed 32. However, when >> multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals >> allowed for a single RS. >> >> Let me further explain how it leads to RegionTooBusyException: >> 1. if the number of un-archived wal exceeds the setting, it will check the >> oldest WAL and flush all regions involved in it >> 2. if the data ingestion speed is high and wal keeps rolling, there'll be >> many small hfiles flushed out, that compaction speed cannot catch up >> 3. when hfile number of one store exceeds the setting of >> hbase.hstore.blockingStoreFiles (10 by default), it will delay the flush >> for hbase.hstore.blockingWaitTime (90s by default) >> 4. when data ingestion continues but flush delayed, the memstore size might >> exceed the upper limit thus throw RegionTooBusyException >> >> Hope these information helps. >> >> Best Regards, >> Yu >> >> On 6 June 2017 at 13:39, Sachin Jain wrote: >> >> > Hi, >> > >> > I was in the middle of a situation where I was getting >> > *RegionTooBusyException* with log something like: >> > >> > *Above Memstore limit, regionName = X ... memstore size = Y and >> > blockingMemstoreSize = Z* >> > >> > This potentially hinted me towards *hotspotting* of a particular region. >> So >> > I fixed my keyspace partitioning to have more uniform distribution per >> > region. It did not completely fix the problem but definitely delayed it a >> > bit. >> > >> > Next thing, I enabled *multiWal*. As I remember there is a configuration >> > which leads to flushing of memstores when the threshold of wal is >> reached. >> > Upon doing this, problem seems to go away. >> > >> > But, this raises couple of questions >> > >> > 1. Are there any reprecussions of using *multiWal* in production >> > environment ? >> > 2. If there are no repercussions and only benefits of using *multiWal*, >> why >> > is this not turned on by default. Let other consumers turn it off in >> > certain (whatever) scenarios. >> > >> > PS: *Hbase Configuration* >> > Single Node (Local Setup) v1.3.1 Ubuntu 16 Core machine. >> > >> > Thanks >> > -Sachin >> > >>