Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71B08183B5 for ; Fri, 5 Feb 2016 09:32:40 +0000 (UTC) Received: (qmail 81969 invoked by uid 500); 5 Feb 2016 09:32:40 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 81918 invoked by uid 500); 5 Feb 2016 09:32:40 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 81898 invoked by uid 99); 5 Feb 2016 09:32:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2016 09:32:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id EFF8B2C1F57 for ; Fri, 5 Feb 2016 09:32:39 +0000 (UTC) Date: Fri, 5 Feb 2016 09:32:39 +0000 (UTC) From: "Duo Zhang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133896#comment-15133896 ] Duo Zhang commented on HBASE-14790: ----------------------------------- I have got a 5 regionservers test cluster and run pe tool with async enabled or disabled several times. The total throughput is same, and the .999 is a little smaller with async enabled. I think this is what we expect :) This is only a simple test, not the final result. Will be back after spring festival. Thanks. > Implement a new DFSOutputStream for logging WAL only > ---------------------------------------------------- > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement > Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all purposes. But in fact, we do not need most of the features if we only want to log WAL. For example, we do not need pipeline recovery since we could just close the old logger and open a new one. And also, we do not need to write multiple blocks since we could also open a new logger if the old file is too large. > And the most important thing is that, it is hard to handle all the corner cases to avoid data loss or data inconsistency(such as HBASE-14004) when using original DFSOutputStream due to its complicated logic. And the complicated logic also force us to use some magical tricks to increase performance. For example, we need to use multiple threads to call {{hflush}} when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)