accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cardon, Tejay E" <>
Subject RE: EXTERNAL: Re: RFile details
Date Mon, 25 Jun 2012 23:53:32 GMT
Thanks Eric.  That helps.  With regards to the repeating key piece, does this only happen for
successive cells?  In other words, if I use the same cf in every row of a table, does that
cf get repeated each time, or does this cf repetition work across rows.  I hope that makes



From: Eric Newton []
Sent: Monday, June 25, 2012 4:46 PM
Subject: EXTERNAL: Re: RFile details

Here's my high-level understanding.  Let me know which aspect you would like to know more

RFile is built on top of BCFile, so you would need to dig up documentation on that. Most of
the compression is performed at that layer.

However, RFile uses a few bits of each key/value to encode any repeating row, cf, cq, cv information.
 This is helpful when a file contains just one row, or when most of the data has the same

BTW, "R" in RFile, stands for "Relative Key."

Column families are grouped together into locality groups, and those families falling outside
of any defined family group go in the "default" locality group.  Column family -> locality
group mappings are written to metadata at the end of the RFile.  Locality groups are stored
in successive sections of a file.   Input is re-scanned multiple times during compactions
to produce locality groups that match a tables family->group mapping at the time of the

In 1.3, index information is stored in one large block at the end of the file.  In 1.4, the
index blocks are hierarchical, to support incremental loading of the index.


On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E <<>>
                Can anyone point me to a design paper or other source of some detail on how
RFiles work?  I'm curious about the compression under the covers as well as the layout on
disk of column families, etc.

Tejay Cardon

View raw message