incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Low <r...@acunu.com>
Subject Re: Cassandra 0.8 Counters Inverted Index?
Date Mon, 03 Oct 2011 09:34:31 GMT
On Mon, Oct 3, 2011 at 9:14 AM, Pierre-Yves Ritschard
<pyr@smallrivers.com> wrote:
> Unfortunately there's no way to do this in Cassandra right now, except
> by using another row as index, like you're doing right now.
>
> Of course you could also store by source_id.date and have a batch job
> iterate over all sources to compute the top 100. It would not be real
> time any more though.

Indexes are used to trade-off some insert performance for write
performance.  The index you describe is optimal for reads, so writes
take a hit.  As Pierre says, the only way to maintain an index in
Cassandra is to read, delete and insert on every increment.  This is
how secondary indexes work under the hood in Cassandra, although they
are not implemented for counters.  It's more expensive for counters
though since a counter read is in general more expensive.

So to speed up inserts, you have to take the hit on reads.  The other
extreme is to not build an index at all and read in all the counters
and sort on the client.  But given you have 10,000s of counters, this
will be slow, but inserts are optimal.  A batch job will work too,
provided you are happy to have it non-real time, or slightly out of
date.

Richard.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu

Mime
View raw message