incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Williams <dri...@gmail.com>
Subject Re: Cassandra paging, gathering stats
Date Mon, 22 Feb 2010 20:03:32 GMT
On Mon, Feb 22, 2010 at 1:40 PM, Sonny Heer <sonnyheer@gmail.com> wrote:

> Hey,
>
> We are in the process of implementing a cassandra application service.
>
> we have already ingested TB of data using the cassandra bulk loader
> (StorageService).
>
> One of the requirements is to get a data explosion factor as a result of
> denormalization.  Since the writes are going to the memory tables, I'm not
> sure how I could grab stats.  I cant get size of data before ingest since
> some of the data may be duplicated.
>

Are you talking about duplication across nodes due to the replication
factor, or because some rows may still be in the memtable?

I think what you want to do is bin/nodeprobe flush, bin/nodeprobe compact,
wait until the system is idle and then sum the size of everything in your
data paths that starts with the name of your column family.

Also a general problem we are running into is an easy way to do paging over
> the data set (not just rows but columns).  Looks like now the API has ways
> to do count, but no offset.
>

Columns can easily be paginated via the 'start' and 'finish' parameters.
 You can't jump to a random page, but you can provide next/previous
behavior.

-Brandon

Mime
View raw message