To Cao Jiguang
 
I was watching this presentation on bigtable yesterday
http://video.google.com/videoplay?docid=7278544055668715642#
 
and Jeff mentioned that they compared three different compression libraries
BMDiff, LZO and gzip.   Apparently, gzip was the most cpu intensive and they ended up going with BMDiff.
I didn't find any Open source / Free implementation of BMDiff but I found LZO.
http://www.oberhumer.com/opensource/lzo/
 
 
Thanks
-Venu

 
On Thu, Apr 1, 2010 at 3:07 AM, Weijun Li <weijunli@gmail.com> wrote:

Thrift client doesn’t seem to compress anything unless you change thrift protocol or use a transport that support compression. I modified TSocket to support compression but it occasionally has broken pipe error due to crappy Java zlib support (so that clients has to reconnect to get around the socket error).  This is a support in transport layer meaning you’ll get compression support for all or none.

 

Cassandra server doesn’t seem to support compression either and we are doing that for memory cache by plugging memcached into Cassandra. Still testing…

 

-Weijun

 

From: Ran Tavory [mailto:rantav@gmail.com]
Sent: Wednesday, March 31, 2010 11:37 PM

Subject: compression

 

What sort of compression (if any) is performed by cassandra?

Does the thrift client compress anything before sending to the server to preserve bandwidth?

Does the server compress the values in the columns to preserve disk or memory?

 

... I assume compaction, performed on the server side, is different than compression... however, does compaction include any compression features as well?

 

Thanks