accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cardon, Tejay E" <tejay.e.car...@lmco.com>
Subject RE: EXTERNAL: Re: RFile details
Date Tue, 26 Jun 2012 14:45:37 GMT
Thanks Eric and John.  That makes sense, and clears up some things regarding design.  I really
appreciate both of you taking time to respond.

Tejay

-----Original Message-----
From: Eric Newton [mailto:eric.newton@gmail.com] 
Sent: Monday, June 25, 2012 7:32 PM
To: user@accumulo.apache.org
Subject: Re: EXTERNAL: Re: RFile details

Given, for r,cf,cq,cv:

a,b,c,d
a,b,c,e
a,b,q,f
a,b,x,d

Relative key encoding in RFile will result in the following symbolic encoding:

a,b,c,d
,,,e
,,q,f
,,x,d

This is not the optimal encoding, but it is fast and works well in practice, especially in
the tables that accumulo supports well: those with millions of columns in a row.

To answer your question, if you use only one cf, it will only be encoded once per block.

-Eric

On Mon, Jun 25, 2012 at 7:53 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com> wrote:
> Thanks Eric.  That helps.  With regards to the repeating key piece, 
> does this only happen for successive cells?  In other words, if I use 
> the same cf in every row of a table, does that cf get repeated each 
> time, or does this cf repetition work across rows.  I hope that makes sense.
>
>
>
> Thanks,
>
>
>
> Tejay
>
>
>
> From: Eric Newton [mailto:eric.newton@gmail.com]
> Sent: Monday, June 25, 2012 4:46 PM
> To: user@accumulo.apache.org
> Subject: EXTERNAL: Re: RFile details
>
>
>
> Here's my high-level understanding.  Let me know which aspect you 
> would like to know more about.
>
>
>
> RFile is built on top of BCFile, so you would need to dig up 
> documentation on that. Most of the compression is performed at that layer.
>
>
>
> However, RFile uses a few bits of each key/value to encode any 
> repeating row, cf, cq, cv information.  This is helpful when a file 
> contains just one row, or when most of the data has the same visibility.
>
>
>
> BTW, "R" in RFile, stands for "Relative Key."
>
>
>
> Column families are grouped together into locality groups, and those 
> families falling outside of any defined family group go in the "default"
> locality group.  Column family -> locality group mappings are written 
> to metadata at the end of the RFile.  Locality groups are stored in 
> successive sections of a file.   Input is re-scanned multiple times 
> during compactions to produce locality groups that match a tables 
> family->group mapping at the time of the compaction.
>
>
>
> In 1.3, index information is stored in one large block at the end of 
> the file.  In 1.4, the index blocks are hierarchical, to support 
> incremental loading of the index.
>
>
>
> -Eric
>
>
>
> On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E 
> <tejay.e.cardon@lmco.com>
> wrote:
>
> All,
>
>                 Can anyone point me to a design paper or other source

> of some detail on how RFiles work?  I'm curious about the compression 
> under the covers as well as the layout on disk of column families, etc.
>
>
>
> Thanks,
>
> Tejay Cardon
>
>

Mime
View raw message