Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2F761200D65 for ; Mon, 25 Dec 2017 11:42:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2E74B160C1E; Mon, 25 Dec 2017 10:42:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7464D160C13 for ; Mon, 25 Dec 2017 11:42:06 +0100 (CET) Received: (qmail 77387 invoked by uid 500); 25 Dec 2017 10:42:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 77376 invoked by uid 99); 25 Dec 2017 10:42:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Dec 2017 10:42:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0104EC13E5 for ; Mon, 25 Dec 2017 10:42:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id WEx7AGntx5AV for ; Mon, 25 Dec 2017 10:42:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 222495F3D0 for ; Mon, 25 Dec 2017 10:42:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 95CC5E012B for ; Mon, 25 Dec 2017 10:42:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8CCE2212F7 for ; Mon, 25 Dec 2017 10:42:01 +0000 (UTC) Date: Mon, 25 Dec 2017 10:42:00 +0000 (UTC) From: "Jingyun Tian (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 25 Dec 2017 10:42:07 -0000 [ https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingyun Tian updated HBASE-19358: --------------------------------- Attachment: HBASE-19358-v5.patch > Improve the stability of splitting log when do fail over > -------------------------------------------------------- > > Key: HBASE-19358 > URL: https://issues.apache.org/jira/browse/HBASE-19358 > Project: HBase > Issue Type: Improvement > Components: MTTR > Affects Versions: 0.98.24 > Reporter: Jingyun Tian > Assignee: Jingyun Tian > Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, HBASE-19358.patch > > > The way we splitting log now is like the following figure: > !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg! > The problem is the OutputSink will write the recovered edits during splitting log, which means it will create one WriterAndPath for each region and retain it until the end. If the cluster is small and the number of regions per rs is large, it will create too many HDFS streams at the same time. Then it is prone to failure since each datanode need to handle too many streams. > Thus I come up with a new way to split log. > !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg! > We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we will pick the largest EntryBuffer and write it to a file (close the writer after finish). Then after we read all entries into memory, we will start a writeAndCloseThreadPool, it starts a certain number of threads to write all buffers to files. Thus it will not create HDFS streams more than *_hbase.regionserver.hlog.splitlog.writer.threads_* we set. > The biggest benefit is we can control the number of streams we create during splitting log, > it will not exceeds *_hbase.regionserver.wal.max.splitters * hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is *_hbase.regionserver.wal.max.splitters * the number of region the hlog contains_*. -- This message was sent by Atlassian JIRA (v6.4.14#64029)