accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: EXTERNAL: Re: RFile details
Date Tue, 26 Jun 2012 01:32:06 GMT
Given, for r,cf,cq,cv:

a,b,c,d
a,b,c,e
a,b,q,f
a,b,x,d

Relative key encoding in RFile will result in the following symbolic encoding:

a,b,c,d
,,,e
,,q,f
,,x,d

This is not the optimal encoding, but it is fast and works well in
practice, especially in the tables that accumulo supports well: those
with millions of columns in a row.

To answer your question, if you use only one cf, it will only be
encoded once per block.

-Eric

On Mon, Jun 25, 2012 at 7:53 PM, Cardon, Tejay E
<tejay.e.cardon@lmco.com> wrote:
> Thanks Eric.  That helps.  With regards to the repeating key piece, does
> this only happen for successive cells?  In other words, if I use the same cf
> in every row of a table, does that cf get repeated each time, or does this
> cf repetition work across rows.  I hope that makes sense.
>
>
>
> Thanks,
>
>
>
> Tejay
>
>
>
> From: Eric Newton [mailto:eric.newton@gmail.com]
> Sent: Monday, June 25, 2012 4:46 PM
> To: user@accumulo.apache.org
> Subject: EXTERNAL: Re: RFile details
>
>
>
> Here's my high-level understanding.  Let me know which aspect you would like
> to know more about.
>
>
>
> RFile is built on top of BCFile, so you would need to dig up documentation
> on that. Most of the compression is performed at that layer.
>
>
>
> However, RFile uses a few bits of each key/value to encode any repeating
> row, cf, cq, cv information.  This is helpful when a file contains just one
> row, or when most of the data has the same visibility.
>
>
>
> BTW, "R" in RFile, stands for "Relative Key."
>
>
>
> Column families are grouped together into locality groups, and those
> families falling outside of any defined family group go in the "default"
> locality group.  Column family -> locality group mappings are written to
> metadata at the end of the RFile.  Locality groups are stored in successive
> sections of a file.   Input is re-scanned multiple times during compactions
> to produce locality groups that match a tables family->group mapping at the
> time of the compaction.
>
>
>
> In 1.3, index information is stored in one large block at the end of the
> file.  In 1.4, the index blocks are hierarchical, to support incremental
> loading of the index.
>
>
>
> -Eric
>
>
>
> On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com>
> wrote:
>
> All,
>
>                 Can anyone point me to a design paper or other source
of
> some detail on how RFiles work?  I’m curious about the compression under the
> covers as well as the layout on disk of column families, etc.
>
>
>
> Thanks,
>
> Tejay Cardon
>
>

Mime
View raw message