couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Klo <jim....@sri.com>
Subject Re: Replicating with two CouchDBs that share the same URL
Date Sat, 11 Aug 2012 15:54:47 GMT

See inline:

Sent from my iPad

On Aug 11, 2012, at 6:38 AM, "Ladislav Thon" <ladicek@gmail.com> wrote:

> Friendly ping? :-)
> 
> LT
> 
> 2012/6/27 Ladislav Thon <ladicek@gmail.com>
> 
>> Hi,
>> 
>> we're using CouchDB (version 1.1.1 currently, but planning to upgrade to
>> 1.2.0) because of its multi-master replication. The replication topology is
>> a simple star -- single central server and a number of clients that
>> replicate both from and to the central server. Writes are (almost) always
>> done on the clients.
>> 
>> Now for high availability, the central server isn't actually a single
>> machine, but two machines (and therefore two couches) whose IP addresses
>> are mapped to the same domain name (DNS round robin). These two couches
>> also replicate with each other. The clients don't know about this, they
>> always replicate from and to https://central.couch:6984/database.
>> 

So your edges are essentially masters and your central servers are really just slaves in a
manner of speaking? It seems to me using RRDNS for your central server would be potentially
bad, given that replication uses the changes feed and last local sequence (which may not be
in the same doc_id order across servers) in a local 'watermark' doc that whose doc_id is a
computed hash of the replication doc. RRDNS doesn't guarantee you're always talking to the
same server so your 'clients' are most likely missing docs. 

Server 1: seq:doc_id
1:id1 - 2:id2 - 3:id3 - 4:id4 - 5:id5 - 6:id6

Server 2: seq:doc_id
1:id5 - 2:id6 - 3:id4 - 4:id2 - 5:id1 - 6:id3

Consider the above possible changes feeds, which assumes all docs are in both, but in different
order, if first replication hits server 1 and gets only seq 1 - 3, it will then get docids
1-3 too. But due to RRDNS the next replication is with server 2, it will start at seq 4 as
your DNS is the same, which would then try replicating the same docs again, possibly with
conflicts, but would mask docids 4 - 6 from being replicated!

>> This might not be the best architecture for HA and we would be able to
>> change it, but I'd still love to get an answer to this question: is CouchDB
>> able to cope with this?

It can't because of RRDNS. 

>> How does it know that it replicates with the same
>> couch it replicated with before (so that it only has to replay changes) and
>> how does it recognize that it replicates with a different couch than before
>> (and has to copy the whole database)?
>> 

It doesn't know it's replicating with a different DB AFAIK.


>> I know that it was already proposed several times to add an UUID to
>> CouchDB server/database, which would solve this issue, and I also know that
>> it's very easy to end up with duplicates, which renders universallly
>> unique identifiers ... not so *unique* (i.e. useless).
>> 

I don't know the status of this, but I've not seen replication between multiple servers of
the domain name work right. 

I don't know how many servers you have total, but just have every server replicate to others
in the cluster. I don't know how well this scales ultimately, but we've not needed RRDNS to
make it work.  

Also I'm assuming any app you use is fixated on a specific server or is using session affinity,
otherwise your app will have inconsistent behavior too. 

>> ---
>> 
>> Also, I have a question about replication monitoring. Are there some best
>> practices for monitoring whether the replication is working? I can of
>> course read the corresponding document in the _replicator database and look
>> at the _replication_state field, but this will only tell me that the
>> replication is *running* -- and I want to know that it's actually *working
>> *. For now, we are using a pretty naive approach: 1. Every 10 minutes,
>> write a document with current date and time to the central couch. 2.
>> Periodically check on all clients (we have them under control) that the
>> document isn't too old. Is there a better approach?
>> 

Your approach to check consistency is probably about the simplest one could do. I do think
your RRDS setup could cause rare to infrequent issues. I'd suggest just have each client (edge)
replicate with all central servers using unique server names and have the central servers
replicate with each other (however that shouldn't be necessary, if the clients are replicating
to both). 

>> Thanks for your opinions!
>> 
>> LT
>> 

Mime
View raw message