Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E42BC177FA for ; Wed, 21 Jan 2015 01:03:46 +0000 (UTC) Received: (qmail 43194 invoked by uid 500); 21 Jan 2015 01:03:46 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 43141 invoked by uid 500); 21 Jan 2015 01:03:46 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 43128 invoked by uid 99); 21 Jan 2015 01:03:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2015 01:03:46 +0000 Date: Wed, 21 Jan 2015 01:03:46 +0000 (UTC) From: "Enis Soztutar (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12883) Support block encoding based on knowing set of column qualifiers up front MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284948#comment-14284948 ] Enis Soztutar commented on HBASE-12883: --------------------------------------- This would be useful in other contexts as well. Even without Phoenix, I expect some users have a predefined list of column qualifiers that changes very slowly over time. I think we can even auto detect the column qualifiers and do dictionary encoding per block which would make this very easy to use. We have the full block unencoded buffered up, it should be possible to do so. Per block dictionary is good, but won't give us the full benefits of per-file dictionary. Maybe we can have a small dictionary where we maintain a file-global dictionary, and if the block's columns all fit there, just use that, and encode the dictionary at the trailer of hfile. > Support block encoding based on knowing set of column qualifiers up front > ------------------------------------------------------------------------- > > Key: HBASE-12883 > URL: https://issues.apache.org/jira/browse/HBASE-12883 > Project: HBase > Issue Type: Bug > Reporter: James Taylor > Labels: Phoenix > > Phoenix knows up front the set of column qualifiers a row will have. We could likely get some good compression with little CPU based on this by having a block encoding scheme that leverages this information. It could be made non-Phoenix specific by identifying the set of column qualifiers through meta data to the block encoder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)