Let's say you're logging events, and you have billions of events. What if the events come in bursts, so within a day there are millions of events, but they all come within microseconds of each other a few times a day? How do you find the events that happened on a particular day if you can't store them all in one row?

On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook <jshook@gmail.com> wrote:
Either OPP by key, or within a row by column name. I'd suggest the latter.
If you have structured data to stick under a column (named by the
timestamp), then you can serialize and unserialize it yourself, or you
can use a supercolumn. It's effectively the same thing.  Cassandra
only provides the super column support as a convenience layer as it is
currently implemented. That may change in the future.

You didn't make clear in your question why a standard column would be
less suitable. I presumed you had layered structure within the
timestamp, hence my response.
How would you logically partition your dataset according to natural
application boundaries? This will answer most of your question.
If you have a dataset which can't be partitioned into a reasonable
size row, then you may want to use OPP and key concatenation.

What do you mean by giant?

On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn <david@lookin2.com> wrote:
> How do I handle giant sets of ordered data, e.g. by timestamps, which I want
> to access by range?
>
> I can't put all the data into a supercolumn, because it's loaded into memory
> at once, and it's too much data.
>
> Am I forced to use an order-preserving partitioner? I don't want the
> headache. Is there any other way?
>