On Sat, Sep 3, 2011 at 8:53 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
I strongly suspect that you're optimizing prematurely.  What evidence
do you have that timestamps are producing unacceptable overhead for
your workload?  

It's possible … this is back of the envelope at the moment as right now it's a nonstarter.  
You do realize that the sparse data model means that
we spend a lot more than 8 bytes storing column names in-line with
each column too, right?

Yeah… this can be mitigated if the column names are your data.

If disk space is really the limiting factor for your workload, I would
recommend testing the compression code in trunk.  That will get you a
lot farther than adding extra options for a very niche scenario.

Another thing I've been considering is building a serializer/deserializer in front of Cassandra and running my own protocol to talk to it which builds its own encoding per row to avoid using excessive columns.



Founder/CEO Spinn3r.com

Location: San Francisco, CA
Skype: burtonator

Skype-in: (415) 871-0687