cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabrice Douchant <fdouch...@gfproducts.ch>
Subject RE: Cassandra datacenters replication advanced usage
Date Tue, 02 Jun 2015 11:41:13 GMT
Hello Marcus and thank you for your fast reply.

Yes, we thought about that and indeed it would work. However we really have writes and reads
constraints for respectively producer and consumer datacenters so we would like to keep all/most
access "local".

We don't need synchronization between datacenters to be fast, we just need to know when it's
done :-/

Fabrice

From: Marcus Olsson [mailto:marcus.olsson@ericsson.com]
Sent: mardi 2 juin 2015 13:29
To: user@cassandra.apache.org
Subject: Re: Cassandra datacenters replication advanced usage

Hi Fabrice,

Have you considered using "each_quorum" instead of "all"?

Each_quorum will require replies from a quorum of nodes from all datacenters.

This could be used either:
Producer using each_quorum and consumer local_quroum. (better read latencies at the cost of
write latencies)

or

Producer using local_quorum and consumer each_quorum. (better write latencies at the cost
of read latencies)

BR
Marcus Olsson
On 06/02/2015 01:00 PM, Fabrice Douchant wrote:
Hi everyone.

For a project, we use a Cassandra cluster in order to have fast reads/writes on a large number
of (column oriented) generated data.

Until now, we only had 1 datacenter for prototyping.

We now plan to split our cluster in 2 datacenters to meet performance requirements (the data
transfer between both datacenter is quite slow):

datacenter #1 : located near our data producer services : intensively writes all data in Cassandra
periodically (each writes has a "run_id" column in its primary key)
datacenter #2 : located near our data consumer services: intensively reads all data produced
by datacenter #1 for a given "run _id".
However, we would like our consumer services to access data only in the datacenter near them
(datacenter #2) and when all data for a given "run_id" have been completely replicated from
datacenter #1 (data generated by the producer services).

My question is : how can we ensure that all data have been replicated in datancenter #2 before
telling producer services (near datacenter #2) to start using them ?

Our best solutions so far (but still not good enough :-P):

producer services (datacenter #1) writes in consistency "all". But this leads to poor partitioning
failure tolerance AND really bad writes performances.
producer services (datacenter #1) writes in consistency "local_quorum" and a last "run finished"
value could be written in consistency "all". But it seems Cassandra does not ensure replication
ordering.
Do you have any suggestion ?

Thanks a lot,

Fabrice



Mime
View raw message