accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: visibility expression & column compression
Date Mon, 24 Aug 2015 18:35:00 GMT
Visibility labels are not replaced with any other types of identifiers 
which means that, considering nothing else, a visibility label which has 
20 characters will take up more space than one that only has 2 
characters. This is a conscious decision to make sure it is completely 
obvious what the label on some data is without an external lookup table.

Accumulo uses two strategies to reduce the size of data on disk: run 
length encoding and a compression algorithm. The run-length encoding is 
used to prevent common prefixes in a sequential Keys from being stored 
multiple times. For example, given the following Keys

row1 cf:cq []
row2 cf:cq []

the RLE would prevent "row" from being stored a second time. Families 
and qualifiers would only be replaced with a back-reference if there is 
a common Key-prefix that extends into the family or qualifier.

A compression algorithm, GZ by default, is then applied to the result of 
the encoding. Snappy is another common compression algorithm used by 
Accumulo instances.

- Josh wrote:
> Hi there,
> My question is how Accumulo compression works in regards to visibility
> labels.
> Is there any difference between ”VeryLargeLargeLarge &
> AlsoLargeLargeLarge” and “A&B” expressions? Will it be internally
> compiled to a low data consuming structure?
> Same question applies to column and qualifier names. Is there any
> difference?
> The reason for this question is simple – we are trying to find out what
> would be the data utilization overhead for different approaches.
> Regards
> Roman
> Please consider the environment before printing this email. This message
> should be regarded as confidential. If you have received this email in
> error please notify the sender and destroy it immediately. Statements of
> intent shall only become binding when confirmed in hard copy by an
> authorised signatory. The contents of this email may relate to dealings
> with other companies under the control of BAE Systems Applied
> Intelligence Limited, details of which can be found at

View raw message