accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <john.w.vi...@ugov.gov>
Subject RE: EXTERNAL: Re: RFile details
Date Tue, 26 Jun 2012 00:07:45 GMT
I believe the relative key encoding only occurs on keys which are adjacent.
So a cf in different rows well not have relative encoding unless there was
nothing else between them in that range.

Now, keep in mind that after relative key encoding, we still run a
compression algorithm so highly repetitive, non-adjacent keys should still
end up tiny on disk.

Sent from my phone, so pardon the typos and brevity.
On Jun 25, 2012 7:54 PM, "Cardon, Tejay E" <tejay.e.cardon@lmco.com> wrote:

>  Thanks Eric.  That helps.  With regards to the repeating key piece, does
> this only happen for successive cells?  In other words, if I use the same
> cf in every row of a table, does that cf get repeated each time, or does
> this cf repetition work across rows.  I hope that makes sense.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Tejay****
>
> ** **
>
> *From:* Eric Newton [mailto:eric.newton@gmail.com]
> *Sent:* Monday, June 25, 2012 4:46 PM
> *To:* user@accumulo.apache.org
> *Subject:* EXTERNAL: Re: RFile details****
>
> ** **
>
> Here's my high-level understanding.  Let me know which aspect you would
> like to know more about.****
>
> ** **
>
> RFile is built on top of BCFile, so you would need to dig up documentation
> on that. Most of the compression is performed at that layer.****
>
> ** **
>
> However, RFile uses a few bits of each key/value to encode any repeating
> row, cf, cq, cv information.  This is helpful when a file contains just one
> row, or when most of the data has the same visibility.****
>
> ** **
>
> BTW, "R" in RFile, stands for "Relative Key."****
>
> ** **
>
> Column families are grouped together into locality groups, and those
> families falling outside of any defined family group go in the "default"
> locality group.  Column family -> locality group mappings are written to
> metadata at the end of the RFile.  Locality groups are stored in successive
> sections of a file.   Input is re-scanned multiple times during compactions
> to produce locality groups that match a tables family->group mapping at the
> time of the compaction.****
>
> ** **
>
> In 1.3, index information is stored in one large block at the end of the
> file.  In 1.4, the index blocks are hierarchical, to support incremental
> loading of the index. ****
>
> ** **
>
> -Eric****
>
> ** **
>
> On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com>
> wrote:****
>
> All,****
>
>                 Can anyone point me to a design paper or other source of
> some detail on how RFiles work?  I’m curious about the compression under
> the covers as well as the layout on disk of column families, etc.****
>
>  ****
>
> Thanks,****
>
> Tejay Cardon****
>
> ** **
>

Mime
View raw message