Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: "David G. Boney" <dboney1@semanticartifacts.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: Simple Compression Scheme
Date: Thu, 17 Feb 2011 11:27:10 -0600
Message-Id: <D1395AB3-A874-4DD7-9E9B-63DE8772D3B5@semanticartifacts.com>
To: user@cassandra.apache.org
Mime-Version: 1.0 (Apple Message framework v1082)

Below is a link for a simple client side compression scheme. I thought =
this might be of interest for some members of the list.

While column values and column names are easy to handle on the client =
side, with the use of a custom column name comparator for the column =
names, the fact that there is only one row partitioner for all column =
families makes it complicated to use compression for the row keys if you =
have multiple data types for the keys of the different column families. =
Using properties of Unicode, the below scheme can differentiate between =
uncompresses Unicode strings, compressed Unicode strings, uncompressed =
UUIDs, and a pass through code for no compression for a one byte =
penalty. For my project I only use Unicode strings and UUIDs for my row =
keys, so this works well for me. The actual compression algorithm can =
work with both short strings using a static probability table for =
arithmetic coding compression and long strings using an adaptive =
arithmetic coding compression You milage may vary. I will have code for =
this design in a month or two.

http://www.semanticartifacts.com/compression/compression.html

-------------
Sincerely,
David G. Boney
dboney1@semanticartifacts.com
http://www.semanticartifacts.com