incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: Giant sets of ordered data
Date Wed, 02 Jun 2010 15:57:28 GMT
Let's say you're logging events, and you have billions of events. What if
the events come in bursts, so within a day there are millions of events, but
they all come within microseconds of each other a few times a day? How do
you find the events that happened on a particular day if you can't store
them all in one row?

On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook <jshook@gmail.com> wrote:

> Either OPP by key, or within a row by column name. I'd suggest the latter.
> If you have structured data to stick under a column (named by the
> timestamp), then you can serialize and unserialize it yourself, or you
> can use a supercolumn. It's effectively the same thing.  Cassandra
> only provides the super column support as a convenience layer as it is
> currently implemented. That may change in the future.
>
> You didn't make clear in your question why a standard column would be
> less suitable. I presumed you had layered structure within the
> timestamp, hence my response.
> How would you logically partition your dataset according to natural
> application boundaries? This will answer most of your question.
> If you have a dataset which can't be partitioned into a reasonable
> size row, then you may want to use OPP and key concatenation.
>
> What do you mean by giant?
>
> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn <david@lookin2.com>
> wrote:
> > How do I handle giant sets of ordered data, e.g. by timestamps, which I
> want
> > to access by range?
> >
> > I can't put all the data into a supercolumn, because it's loaded into
> memory
> > at once, and it's too much data.
> >
> > Am I forced to use an order-preserving partitioner? I don't want the
> > headache. Is there any other way?
> >
>

Mime
View raw message