Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 15AE6D0FA for ; Fri, 16 Nov 2012 18:16:14 +0000 (UTC) Received: (qmail 47397 invoked by uid 500); 16 Nov 2012 18:16:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 47357 invoked by uid 500); 16 Nov 2012 18:16:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 47348 invoked by uid 99); 16 Nov 2012 18:16:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Nov 2012 18:16:13 +0000 Date: Fri, 16 Nov 2012 18:16:13 +0000 (UTC) From: "Jean-Daniel Cryans (JIRA)" To: issues@hbase.apache.org Message-ID: <1770920431.124988.1353089773706.JavaMail.jiratomcat@arcas> In-Reply-To: <529070215.19745.1334269158397.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-5778) Turn on WAL compression by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-5778: -------------------------------------- Attachment: HBASE-5778-0.94-v4.patch This v4 of the patch pushes down the handling of reopened compressed files down to SequenceFileLogReader. The two main changes: - SequenceFileLogReader needs a way to be reused across multiple open/seek/close cycles. For this I added a method called "reopen". The name might be confusing. - ReplicationSource used to just bluntly reopen whatever currentPath is, but now this doesn't work with SFLR being kept around. To fix it I had to add a little dance in ReplicationHLogReader to verify if the path given was different (although still for the same file that was moved to .oldlogs). The HLog and Replication tests pass. > Turn on WAL compression by default > ---------------------------------- > > Key: HBASE-5778 > URL: https://issues.apache.org/jira/browse/HBASE-5778 > Project: HBase > Issue Type: Improvement > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Blocker > Fix For: 0.96.0 > > Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, HBASE-5778.patch > > > I ran some tests to verify if WAL compression should be turned on by default. > For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). > When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). > Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira