cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Wadehra <anujw_2...@yahoo.co.in>
Subject Re: RE: Cassandra datacenters replication advanced usage
Date Tue, 02 Jun 2015 11:53:31 GMT
I think you should use local_quorum for writes and read consistency of consumer can be as per
application requirement. I dont think that cross dc synchronous  reads/ writes are good choice.
 


Knowing when a batch ends is an application level problem nothing to do with Cassandra.  May
be you can add Run no and run count with each record. When rows read for a Run matches the
count , polling consumer knows that run is fully replicated. Not sure its the best solution.


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"Fabrice Douchant" <fdouchant@gfproducts.ch>
Date:Tue, 2 Jun, 2015 at 5:12 pm
Subject:RE: Cassandra datacenters replication advanced usage

Hello Marcus and thank you for your fast reply.

 

Yes, we thought about that and indeed it would work. However we really have writes and reads
constraints for respectively producer and consumer datacenters so we would like to keep all/most
access “local”.

 

We don’t need synchronization between datacenters to be fast, we just need to know when
it’s done :-/

 

Fabrice

 

From: Marcus Olsson [mailto:marcus.olsson@ericsson.com] 
Sent: mardi 2 juin 2015 13:29
To: user@cassandra.apache.org
Subject: Re: Cassandra datacenters replication advanced usage

 

Hi Fabrice,

Have you considered using "each_quorum" instead of "all"?

Each_quorum will require replies from a quorum of nodes from all datacenters.

This could be used either:
Producer using each_quorum and consumer local_quroum. (better read latencies at the cost of
write latencies)

or

Producer using local_quorum and consumer each_quorum. (better write latencies at the cost
of read latencies)

BR
Marcus Olsson

On 06/02/2015 01:00 PM, Fabrice Douchant wrote:

Hi everyone.

 

For a project, we use a Cassandra cluster in order to have fast reads/writes on a large number
of (column oriented) generated data.

 

Until now, we only had 1 datacenter for prototyping.

 

We now plan to split our cluster in 2 datacenters to meet performance requirements (the data
transfer between both datacenter is quite slow):

 

datacenter #1 : located near our data producer services : intensively writes all data in Cassandra
periodically (each writes has a “run_id” column in its primary key)

datacenter #2 : located near our data consumer services: intensively reads all data produced
by datacenter #1 for a given “run _id”.

However, we would like our consumer services to access data only in the datacenter near them
(datacenter #2) and when all data for a given “run_id” have been completely replicated
from datacenter #1 (data generated by the producer services).

 

My question is : how can we ensure that all data have been replicated in datancenter #2 before
telling producer services (near datacenter #2) to start using them ?

 

Our best solutions so far (but still not good enough :-P):

 

producer services (datacenter #1) writes in consistency “all”. But this leads to poor
partitioning failure tolerance AND really bad writes performances.

producer services (datacenter #1) writes in consistency “local_quorum” and a last “run
finished” value could be written in consistency “all”. But it seems Cassandra does not
ensure replication ordering.

Do you have any suggestion ?

 

Thanks a lot,

 

Fabrice 

 

 


Mime
View raw message