incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Hood <0x6e6...@gmail.com>
Subject Re: Using the commit log for external synchronization
Date Fri, 21 Sep 2012 11:31:54 GMT
Hi Aaron,

Thanks for your input.

On Fri, Sep 21, 2012 at 9:56 AM, aaron morton <aaron@thelastpickle.com> wrote:
> The commit log is essentially internal implementation. The total size of the
> commit log is restricted, and the multiple files used to represent segments
> are recycled. So once all the memtables have been flushed for segment it may
> be overwritten.
>
> To archive the segments see the conf/commitlog_archiving.properties file.
>
> Large rows will bypass the commit log.
>
> A write commited to the commit log may still be considered a failure if CL
> nodes do not succeed.

So if I understand you correctly, one shouldn't code against what is
essentially an internal artefact that could be subject to change as
the Cassandra code base evolves and furthermore may not contain the
information an application thinks it should contain.

> IMHO it's a better design to multiplex the data stream at the application
> level.

That's a fair point, and I could multicast the data at that level. The
reason why I was considering querying the commit log was because I
would prefer to implement a state based synchronization as opposed to
an event driven synchronization (which is what the app layer multicast
and the AOP solution Brian suggested would be). This is because I'd
rather know from Cassandra what Cassandra thinks it has got, rather
than trusting an event stream who can only infer what information
Cassandra should theoretically hold. The use case I am looking at
should be reconcilable and hence I'm trying to avoid placing trust in
the fact that all of the events were actually sent correctly, arrived
correctly and were written to the target storage without any bugs. I
also want to detect the scenario that portions of the data that was
written to the target system gets accidentally updated or nuked via a
back door.

So in summary, given that there is no out of the box way of saying to
Cassandra "give me all mutations since timestamp X", I would either
have to go for an event driven approach or reconsider the layout of
the Cassandra store such that I could reconcile it in an efficient
fashion.

Thanks for your help,

Cheers,

Ben

Mime
View raw message