hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-15352) FST BlockEncoder
Date Fri, 26 Feb 2016 22:47:18 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Purtell updated HBASE-15352:
-----------------------------------
    Fix Version/s: 0.98.19

> FST BlockEncoder
> ----------------
>
>                 Key: HBASE-15352
>                 URL: https://issues.apache.org/jira/browse/HBASE-15352
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Nick Dimiduk
>             Fix For: 2.0.0, 0.98.19, 1.4.0
>
>
> We could improve on the existing [PREFIX_TREE block|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/codec/prefixtree/package-summary.html]
encoder by upgrading the persistent data structure from a trie to a finite state transducer.
This would theoretically allow us to reuse bytes not just for rowkey prefixes, but infixes
and suffixes as well. My read of the literature means we may also be able to encode values
as well, further reducing storage size when values are repeated (ie, a "customer id" field
with very low cardinality -- probably happens a lot in our denormalized world). There's a
really nice [blog post|http://blog.burntsushi.net/transducers/] about this data structure,
and apparently our siblings in Lucene make heavy use of [their implementation|http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/util/fst/package-summary.html#package_description].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message