accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: visibility expression & column compression
Date Mon, 24 Aug 2015 19:23:28 GMT
Resending (see below) due to brief ASF email outage.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Mon, Aug 24, 2015 at 2:54 PM, Christopher <ctubbsii@apache.org> wrote:
> Accumulo has a few kinds of compression inside RFiles when apply to
> visibility expressions.
>
> First, there's the block compression in the file. This is going to be
> gzip, or another supported compression type. But, before that, we have
> a couple of ways to reduce the size of the data written:
>
> 1. if the visibility expression in of one key is exactly the same as
> the key which immediately preceded it, VE(K) == VE(K-1), the RFile
> writer stores a flag which instructs the reader to re-use the previous
> visibility expression, in lieu of the visibility expression itself.
>
> 2. in the case of non-exact matches, the RFile writer stores the
> number of bytes it shares with the previous key as a common prefix,
> and then the rest of the bytes which are different.
>
> (Note: these optimizations actually apply to the row, colfam, colqual,
> too, but you specifically asked about colvis.)
>
> What we don't do is create a lookup table or anything like that. We
> think it's really important that the visibility be stored with the
> data it protects, so that the visibility is always there for
> determining authorization to read it. So, we don't do anything beyond
> the few small optimizations during serialization, and certainly
> nothing that would separate the data too far from its visibility
> expression.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Mon, Aug 24, 2015 at 12:58 PM, roman.drapeko@baesystems.com
> <roman.drapeko@baesystems.com> wrote:
>> Hi there,
>>
>>
>>
>> My question is how Accumulo compression works in regards to visibility
>> labels.
>>
>>
>>
>> Is there any difference between ”VeryLargeLargeLarge & AlsoLargeLargeLarge”
>> and “A&B” expressions? Will it be internally compiled to a low data
>> consuming structure?
>>
>>
>>
>> Same question applies to column and qualifier names. Is there any
>> difference?
>>
>>
>>
>> The reason for this question is simple – we are trying to find out what
>> would be the data utilization overhead for different approaches.
>>
>>
>>
>> Regards
>>
>> Roman
>>
>> Please consider the environment before printing this email. This message
>> should be regarded as confidential. If you have received this email in error
>> please notify the sender and destroy it immediately. Statements of intent
>> shall only become binding when confirmed in hard copy by an authorised
>> signatory. The contents of this email may relate to dealings with other
>> companies under the control of BAE Systems Applied Intelligence Limited,
>> details of which can be found at
>> http://www.baesystems.com/Businesses/index.htm.

Mime
View raw message