Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C58519EED for ; Wed, 14 Mar 2012 04:14:34 +0000 (UTC) Received: (qmail 93116 invoked by uid 500); 14 Mar 2012 04:14:34 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 93080 invoked by uid 500); 14 Mar 2012 04:14:34 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 93072 invoked by uid 99); 14 Mar 2012 04:14:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Mar 2012 04:14:34 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Mar 2012 04:14:32 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3BB971FC52 for ; Wed, 14 Mar 2012 04:14:11 +0000 (UTC) Date: Wed, 14 Mar 2012 04:14:11 +0000 (UTC) From: "stack (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <1004962499.10958.1331698451302.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1642646625.2855.1318891871332.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4608) HLog Compression MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228947#comment-13228947 ] stack commented on HBASE-4608: ------------------------------ Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end: {code} -rw-r--r-- 1 stack staff 64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339 -rwxrwxrwx 1 stack staff 28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed -rwxrwxrwx 1 stack staff 28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again -rwxrwxrwx 1 stack staff 28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again -rwxrwxrwx 1 stack staff 64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed -rwxrwxrwx 1 stack staff 64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again {code} Its 44% of original size. > HLog Compression > ---------------- > > Key: HBASE-4608 > URL: https://issues.apache.org/jira/browse/HBASE-4608 > Project: HBase > Issue Type: New Feature > Reporter: Li Pi > Assignee: stack > Fix For: 0.94.0 > > Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt > > > The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira