cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@googlemail.com>
Subject using Cassandra as a write ahead log with duplicates removal
Date Mon, 18 Apr 2011 21:38:19 GMT
Hi all,
The problem:
Map<key, value> is maintained as a simple Cassandra CF and there is a
stream of put/deletes from clients. For newly inserted rows, I need to
update solr/lucene index, by pooling from cassandra. (I know for
solandra, not asking about this)

I am to use cassandra as a classical write ahead log, but with extra
twist, deduplication and mutator operations aggregation.

behind this idea is a Map<Key, SortedList<timestamp, value>> where
list sorted on timestamp contains mutating operations (add(value) or
delete). In order to update solr index I need to see which of keys are
modified since last solr commit. Now I do not know how to do it
efficiently with cassandra.

After "commit" to solr I have either to:

a) remember last timestamp and scan from there (secondary index on
timestamp? Is cassandra native timestamp possible for this)
or
c) keep two CF, "dirty" and "clean" and migrate records from dirty to
clean on commit
or
c)  ???

Somehow I do not like a) b)  as I know I do not yet understand cassandra :(

Any best practices for such use case?

Also, is there efficient operation addIfNotAlreadyThere(key...)...
if(!contains(key)) add(key, value) in one network call.

As far as I understand, I need to check it myself.

As Example:

add(1, AAA)
add(2, BBB)
add(1, CCC) //unconditional
adIfNotThere(1, DDD) //noop as key 1 is already there, not deleted
-------------------------------
should result in following solr indexing operations

1, AAA
2, BBB

Another way to think of it is to identify last add() or last delete()
operation from CF?



Thanks,
eks

Mime
View raw message