Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2AC2DD4EB for ; Tue, 20 Nov 2012 17:03:03 +0000 (UTC) Received: (qmail 18641 invoked by uid 500); 20 Nov 2012 17:03:03 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 18527 invoked by uid 500); 20 Nov 2012 17:03:00 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 18445 invoked by uid 99); 20 Nov 2012 17:02:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2012 17:02:58 +0000 Date: Tue, 20 Nov 2012 17:02:58 +0000 (UTC) From: "Christopher Tubbs (JIRA)" To: notifications@accumulo.apache.org Message-ID: <1070254624.7570.1353430978325.JavaMail.jiratomcat@arcas> In-Reply-To: <1944217209.2892.1349472003539.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (ACCUMULO-790) RFile should compress using common prefixes of key elements MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501300#comment-13501300 ] Christopher Tubbs commented on ACCUMULO-790: -------------------------------------------- Submitted working and tested patch in commit r1411745 > RFile should compress using common prefixes of key elements > ----------------------------------------------------------- > > Key: ACCUMULO-790 > URL: https://issues.apache.org/jira/browse/ACCUMULO-790 > Project: Accumulo > Issue Type: Improvement > Components: tserver > Reporter: Christopher Tubbs > Assignee: Christopher Tubbs > Labels: compression, file, hackathon, optimization, rfile > Fix For: 1.5.0 > > > Relative keys have proven themselves as a great way to compress within dimensions of the key. However, we could probably do better, since we know that our data is sorted lexicographically, we can make a reasonable assumption that we will get better compression if we only store the fact that a key (or portion of a key) has a common prefix with the previous key, even if it is not an exact match. > Currently, in RFile, unused bits from the delete flag byte are being used to store flags that show whether an element of the key is exactly the same as the previous, or if it is different. We can change the semantics of these flags to store three states per element of the key: exact match as previous key, has a common prefix as previous key, no relative key compression. If we don't want to add a byte to store 2 bits for 3 states per element, we can just take the ordinal value of the unused 7 bits of the delete flag field and map it to an enumeration of relative key flags. > In the case of a common prefix flag enabled for a given element of the current key when reading the RFile, we can interpret the first bytes of that element as a VInt expressing the length of the common prefix relative to the previous key's same element. Because this will add at least one byte to the the length of that element, we will not want to use the common prefix compression if the common prefix is less than 2 bytes. For less than 2 bytes in common (1 or 0 bytes in common), we'd select the no compression flag for that element. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira