incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <ty...@riptano.com>
Subject Re: OutOfMemory on count on cassandra 0.6.8 for large number of columns
Date Sun, 12 Dec 2010 18:49:44 GMT
Well, in this case I would say you probably need about 300MB of space in the
heap, since that's what you've calculated.

The APIs are designed to let you do what you think is best and they
definitely won't stop you from shooting yourself in the foot.  Counting a
huge row, or trying to grab every row in a large column family are examples
of this.  Some of the clients try to protect you from this, but there is
only so much that can be done without specific knowledge of the data, and
get_count() is an example of this.

While we're on the topic of large rows, if your row is essentially unbounded
in size, you need to consider splitting it. This is especially true if you
stay with 0.6, where compactions of large rows can OOM you pretty easily.

- Tyler

On Sun, Dec 12, 2010 at 2:07 AM, Dave Martin <moyesyside@googlemail.com>wrote:

> Thanks Tyler. I was unaware of counters.
>
> The use case for column counts is really from a operational perspective,
> to allow a sysadmin to do adhoc checks on columns to see if something
> has gone wrong in software outside of cassandra.
>
> I think running a cassandra-cli command such as count, which makes
> cassandra fall over is not ideal,
> unless we can say for X number of columns cassandra needs at least Y
> memory allocation for stability.
>
> Cheers
>
> Dave
>
>
> On Sun, Dec 12, 2010 at 6:39 PM, Tyler Hobbs <tyler@riptano.com> wrote:
> > Cassandra has to deserialize all of the columns in the row for
> get_count().
> > So from Cassandra's perspective, it's almost as much work as getting the
> > entire row, it just doesn't have to send everything back over the
> network.
> >
> > If you're frequently counting 8 million columns (or really, anything
> > significant), you need to use counters instead.  If this is a rare
> > occurrence, you can do the count in multiple chunks by using a starting
> and
> > ending column in the SlicePredicate for each chunk, but this requires
> some
> > rough knowledge about the distribution of the column names in the row.
> >
> > - Tyler
>

Mime
View raw message