cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From graham sanderson <>
Subject Re: Client-side compression, cassandra or both?
Date Mon, 03 Nov 2014 18:37:58 GMT
I wouldn’t do both.
Unless a little server CPU or (and you’d have to measure it - I imagine it is probably not
significant - as you say C* has more context, and hopefully most things can compress “0,
“ repeatedly) disk space are an issue, I wouldn’t bother to compress yourself. Compression
across the wire is good of course (client side CPU a wash, and server CPU we already mentioned

On a side note, perhaps your object model should address the redundancy, though of course
this is perhaps equivalent to the complexity of doing client side compression, IDK.

We do have one table where we keep compressed blobs, but that is because those are natural
from an application perspective, and so we just turn off C* table compression for those (there
isn’t much other data there).

Note, I haven’t been tracking it recently, but certainly in the past the compression code
path on the C* had to do more data copies, but this is not likely significant unless your
case is special. I believe this has been/will be improved in 2.1 or 3.

> On Nov 3, 2014, at 9:40 AM, DuyHai Doan <> wrote:
> Hello Robin
>  You have many options for compression in C*:
> 1) Serialized in bytes instead of JSON, to save a lot of space due to String encoding.
Of course the data will be opaque and not human readable
> 2) Activate client-node data compression. In this case, do not forget to ship LZ4 or
SNAPPY dependency on the client side. 
> On the server-side, data compression is active by default using LZ4 when you're creating
a new table so there is pretty much nothing to do.
>  It's up to you to consider whether the compression ratio difference between Gzip and
LZ4 does worth relying on C* compression.
> Regards
> On Mon, Nov 3, 2014 at 3:51 PM, Robin Verlangen < <>>
> Hi there,
> We're working on a project which is going to store a lot of JSON objects in Cassandra.
A large piece of this (90%) consists of an array of integers, of which in a lot of cases there
are a bunch of zeroes. 
> The average JSON is 4KB in size, and once GZIP (default compression) just under 100 bytes.

> My question is, should we compress client-side (literally converting JSON string to compressed
gzip bytes), let Cassandra do the work, or do both?
> From my point of view I think Cassandra would be better, as it could compress beyond
a single value, using large blocks within a row / SSTable.
> Thank you in advance for your help.
> Best regards, 
> Robin Verlangen
> Chief Data Architect
> W <>
> E <>
>  <>
> What is CloudPelican? <>
> Disclaimer: The information contained in this message and attachments is intended solely
for the attention and use of the named addressee and may be confidential. If you are not the
intended recipient, you are reminded that the information remains the property of the sender.
You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received
this message in error, please contact the sender immediately and irrevocably delete this message
and any copies.

View raw message