You will find that many of us using cassanda are already doing what you suggest (custom serializer/deserializer).

We call it JSON.


*Sent from Star Trek like flat panel device, which although larger than my Star Trek like communicator device, may have typo's and exhibit improper grammar due to haste and less than perfect use of the virtual keyboard*

On Sep 4, 2011, at 12:11 AM, Kevin Burton <> wrote:

On Sat, Sep 3, 2011 at 8:53 PM, Jonathan Ellis <> wrote:
I strongly suspect that you're optimizing prematurely.  What evidence
do you have that timestamps are producing unacceptable overhead for
your workload?  

It's possible … this is back of the envelope at the moment as right now it's a nonstarter.  
You do realize that the sparse data model means that
we spend a lot more than 8 bytes storing column names in-line with
each column too, right?

Yeah… this can be mitigated if the column names are your data.

If disk space is really the limiting factor for your workload, I would
recommend testing the compression code in trunk.  That will get you a
lot farther than adding extra options for a very niche scenario.

Another thing I've been considering is building a serializer/deserializer in front of Cassandra and running my own protocol to talk to it which builds its own encoding per row to avoid using excessive columns.




Location: San Francisco, CA
Skype: burtonator

Skype-in: (415) 871-0687