hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12883) Support block encoding based on knowing set of column qualifiers up front
Date Wed, 21 Jan 2015 17:57:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285978#comment-14285978
] 

Andrew Purtell commented on HBASE-12883:
----------------------------------------

bq. I think we can even auto detect the column qualifiers and do dictionary encoding per block
which would make this very easy to use. 

We provide lz-ish algorithms - gzip, snappy, lzo - for "whole file compression" but, because
we (de)compress each HFile block independently, they are effectively doing dynamic dictionary
encoding per block already. 

A file global dictionary should produce both performance and compression efficiency gains.
Even more so if the application supplies the dictionary up front. 

> Support block encoding based on knowing set of column qualifiers up front
> -------------------------------------------------------------------------
>
>                 Key: HBASE-12883
>                 URL: https://issues.apache.org/jira/browse/HBASE-12883
>             Project: HBase
>          Issue Type: Bug
>            Reporter: James Taylor
>              Labels: Phoenix
>
> Phoenix knows up front the set of column qualifiers a row will have. We could likely
get some good compression with little CPU based on this by having a block encoding scheme
that leverages this information. It could be made non-Phoenix specific by identifying the
set of column qualifiers through meta data to the block encoder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message