accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: compressing values returned to scanner
Date Tue, 02 Oct 2012 18:24:32 GMT
On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <> wrote:
> My understanding of compression in Accumulo 1.4.1 is that it is on by
> default and that data is decompressed by the tablet server, so data on the
> wire between server/client is decompressed. Is there a way to shift the
> decompression from happening on the server to the client? I have a use case
> where each Value in my table is relatively large (~ 8MB) and I can benefit
> from compression over the wire. I don't have any server side iterators, so
> the values don't need to be decompressed by the tablet server. Also, each
> scan returns a few rows, so client-side decompression can be fast.
> The only way I can think of now is to disable compression on that table, and
> handle compression/decompression in the application. But if there is a way
> to do this in Accumulo, I'd prefer that.

There are two levels of compression in Accumulo.  First redundant
parts of the key are not stored.  If the row in a key is the same as
the previous row, then its not stored again.   The same is done for
columns and time stamps.   After the relative encoding is done a block
of key values is then compressed with gzip.

As data is read from an RFile, when the row of a key is the same as
the previous key it will just point to the previous keys row.  This is
carried forward over the wire.  As keys are transferred, duplicate
fields in the key are not transferred.

As far as decompressing on the client side vs server side, the server
at least needs to decompress keys.  On the server side you usually
need to read from multiple sorted files and order the result.  So you
need to decompress  keys on the server side to compare them.  Also
iterators on the server side need the keys and values decompressed.

> Thanks,
> Ameet

View raw message