Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 21 Jan 2015 01:03:46 +0000 (UTC)
From: "Enis Soztutar (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12768602.1421735669000.129856.1421802226649@Atlassian.JIRA>
In-Reply-To: <JIRA.12768602.1421735669000@Atlassian.JIRA>
References: <JIRA.12768602.1421735669000@Atlassian.JIRA>
 <JIRA.12768602.1421735669548@arcas>
Subject: [jira] [Commented] (HBASE-12883) Support block encoding based on
 knowing set of column qualifiers up front
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284948#comment-14284948 ] 

Enis Soztutar commented on HBASE-12883:
---------------------------------------

This would be useful in other contexts as well. Even without Phoenix, I expect some users have a predefined list of column qualifiers that changes very slowly over time.
I think we can even auto detect the column qualifiers and do dictionary encoding per block which would make this very easy to use. We have the full block unencoded buffered up, it should be possible to do so. Per block dictionary is good, but won't give us the full benefits of per-file dictionary. Maybe we can have a small dictionary where we maintain a file-global dictionary, and if the block's columns all fit there, just use that, and encode the dictionary at the trailer of hfile.  

> Support block encoding based on knowing set of column qualifiers up front
> -------------------------------------------------------------------------
>
>                 Key: HBASE-12883
>                 URL: https://issues.apache.org/jira/browse/HBASE-12883
>             Project: HBase
>          Issue Type: Bug
>            Reporter: James Taylor
>              Labels: Phoenix
>
> Phoenix knows up front the set of column qualifiers a row will have. We could likely get some good compression with little CPU based on this by having a block encoding scheme that leverages this information. It could be made non-Phoenix specific by identifying the set of column qualifiers through meta data to the block encoder.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)