accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: RFile details
Date Mon, 25 Jun 2012 22:45:58 GMT
Here's my high-level understanding.  Let me know which aspect you would
like to know more about.

RFile is built on top of BCFile, so you would need to dig up documentation
on that. Most of the compression is performed at that layer.

However, RFile uses a few bits of each key/value to encode any repeating
row, cf, cq, cv information.  This is helpful when a file contains just one
row, or when most of the data has the same visibility.

BTW, "R" in RFile, stands for "Relative Key."

Column families are grouped together into locality groups, and those
families falling outside of any defined family group go in the "default"
locality group.  Column family -> locality group mappings are written to
metadata at the end of the RFile.  Locality groups are stored in successive
sections of a file.   Input is re-scanned multiple times during compactions
to produce locality groups that match a tables family->group mapping at the
time of the compaction.

In 1.3, index information is stored in one large block at the end of the
file.  In 1.4, the index blocks are hierarchical, to support incremental
loading of the index.

-Eric


On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com>wrote:

>  All,****
>
>                 Can anyone point me to a design paper or other source of
> some detail on how RFiles work?  I’m curious about the compression under
> the covers as well as the layout on disk of column families, etc.****
>
> ** **
>
> Thanks,****
>
> Tejay Cardon****
>

Mime
View raw message