cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David G. Boney" <>
Subject Simple Compression Scheme
Date Thu, 17 Feb 2011 17:27:10 GMT
Below is a link for a simple client side compression scheme. I thought this might be of interest
for some members of the list.

While column values and column names are easy to handle on the client side, with the use of
a custom column name comparator for the column names, the fact that there is only one row
partitioner for all column families makes it complicated to use compression for the row keys
if you have multiple data types for the keys of the different column families. Using properties
of Unicode, the below scheme can differentiate between uncompresses Unicode strings, compressed
Unicode strings, uncompressed UUIDs, and a pass through code for no compression for a one
byte penalty. For my project I only use Unicode strings and UUIDs for my row keys, so this
works well for me. The actual compression algorithm can work with both short strings using
a static probability table for arithmetic coding compression and long strings using an adaptive
arithmetic coding compression You milage may vary. I will have code for this design in a month
or two.

David G. Boney

View raw message