accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-473) Support binary search within RFile blocks
Date Mon, 19 Mar 2012 22:51:38 GMT
Support binary search within RFile blocks
-----------------------------------------

                 Key: ACCUMULO-473
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-473
             Project: Accumulo
          Issue Type: Improvement
          Components: tserver
            Reporter: Keith Turner
            Assignee: Keith Turner
             Fix For: 1.5.0


RFiles store blocks of key values using relative encoding.  By default these blocks are small
(100k).  To find a key in an RFile the index is used to find the block, then the block is
scanned for the key.  It would be nice for iterators that do alot of seeking if a binary search
were done instead of a scan.  

Relative encoding is a form of compression that serializes a key relative to the last key.
 For example if the row in a key is the same as the previous key, then the row is not stored
again.  This works well with sorted data.   The current on disk format does not support random
access within a block, because to read entry N all previous entries must be read.   One option
would be to deserialize the block into a form that supports random access and then do binary
search.  However this will consume memory and CPU.  If the relative encoding format could
be modified to support random access, then binary search could be supported in a memory and
CPU efficient way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message