accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Parisi <>
Subject Re: compressing values returned to scanner
Date Mon, 01 Oct 2012 20:44:06 GMT
I'm sorry, I was't clear. Blame my sickness. When I typed block compression
I was referring to the blocks within the BCFile ( block compressed ), not
gz. But the point still remains. You couldn't return the stream through
thrift ( you could return the whole block ), so you would need to
decompress the keys and values. You could delay decompression of the value,
but you need to decompress to find the size of the value after the relative
key, whereas double compression would get you what you want.

hope that's clear.

On Mon, Oct 1, 2012 at 4:26 PM, Marc Parisi <> wrote:

> Ameet, keys and values ( relative keys ) are extracted from a decompressor
> stream. In the case of block compression (i.e. gz ), you would need to
> return a block so the receiver can decompress it. Therefore, using existing
> compression, as Slacum mentioned, then decompressing the value is likely
> the best method.
> On Mon, Oct 1, 2012 at 4:00 PM, William Slacum <
>> wrote:
>> Someone can correct me if I'm wrong, but I believe the file compression
>> option you quoted is for the RFiles in HDFS. You can enable compression
>> there and will still see some benefit even if you compress the values on
>> ingest.
>> On Mon, Oct 1, 2012 at 12:40 PM, ameet kini <> wrote:
>>> That is exactly my use case (ingest once, serve often, no server-side
>>> iterators).
>>> And I'm doing pre-compression on ingest. I was just looking to do away
>>> with app-level compression code. Not a biggie.
>>> Ameet
>>> On Mon, Oct 1, 2012 at 3:32 PM, William Slacum <
>>>> wrote:
>>>> If you aren't often looking at the data in the value on the tablet
>>>> server (like in an iterator), you can also pre-compress your values on
>>>> ingest.
>>>> On Mon, Oct 1, 2012 at 12:19 PM, Marc Parisi <> wrote:
>>>>> You could compress the data in the value, and decompress the data upon
>>>>> receipt by the scanner.
>>>>> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <>wrote:
>>>>>> My understanding of compression in Accumulo 1.4.1 is that it is on
>>>>>> default and that data is decompressed by the tablet server, so data
on the
>>>>>> wire between server/client is decompressed. Is there a way to shift
>>>>>> decompression from happening on the server to the client? I have
a use case
>>>>>> where each Value in my table is relatively large (~ 8MB) and I can
>>>>>> from compression over the wire. I don't have any server side iterators,
>>>>>> the values don't need to be decompressed by the tablet server. Also,
>>>>>> scan returns a few rows, so client-side decompression can be fast.
>>>>>> The only way I can think of now is to disable compression on that
>>>>>> table, and handle compression/decompression in the application. But
>>>>>> there is a way to do this in Accumulo, I'd prefer that.
>>>>>> Thanks,
>>>>>> Ameet

View raw message