incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <JEREMIAH.JOR...@morningstar.com>
Subject RE: Compression on client side vs server side
Date Mon, 02 Apr 2012 15:53:50 GMT
The server side compression can compress across columns/rows so it will most likely be more
efficient.
Whether you are CPU bound or IO bound depends on your application and node setup.  Unless
your working set fits in memory you will be IO bound, and in that case server side compression
helps because there is less to read from disk.  In many cases it is actually faster to read
a compressed file from disk and decompress it, then to read an uncompressed file from disk.

See Ed's post:
"Cassandra compression is like more servers for free!"
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/cassandra_compression_is_like_getting

________________________________
From: benjamin.j.mccann@gmail.com [benjamin.j.mccann@gmail.com] on behalf of Ben McCann [ben@benmccann.com]
Sent: Monday, April 02, 2012 10:42 AM
To: user@cassandra.apache.org
Subject: Compression on client side vs server side

Hi,

I was curious if I compress my data on the client side with Snappy whether there's any difference
between doing that and doing it on the server side?  The wiki said that compression works
best where each row has the same columns.  Does this mean the compression will be more efficient
on the server side since it can look at multiple rows at once instead of only the row being
inserted?  The reason I was thinking about possibly doing it client side was that it would
save CPU on the datastore machine.  However, does this matter?  Is CPU typically the bottleneck
on a machine or is it some other resource? (of course this will vary for each person, but
wondering if there's a rule of thumb.  I'm making a web app, which hopefully will store about
5TB of data and have 10s of millions of page views per month)

Thanks,
Ben


Mime
View raw message