directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Replication heads up
Date Mon, 08 Aug 2011 08:56:44 GMT
Hi guys,

so we found the reason why the replication tests are failing randomly. 
Let me explain :

- the consumer is connected to the provider until it gets disconnected. 
It can last for days or weeks.
- the producer pushes modifications to the consumer directly if the 
consumer is connected
- if the consumer is disconnected, the modifications are stored in a 
queue, waiting for the client to reconnect to send it the content of 
this queue

That being said, we have one corner case when the provider 'thinks' that 
the consumer is connected when it's not anymore : the message is sent to 
the disconnected client, and we don't push it to the queue, losing it.

One better idea is to push *all* the modifications to the queue, not 
matter what. Then a thread will process this queue and send it contents 
to the client, unless the client isn't connected. In any case, we 
*don't* delete messages from the queue. Never.

That raises a question : what o we do in the long term ? The queue will 
grow and never shrink. In fact this is quite simple : we truncate the 
queue after a defined period of time (say once a day, or once a week). 
Ever modification older than the interval is simply deleted from the queue.

What if a consumer is not able to reconnect within this period of time ? 
Simple :
- the consumer sends the lastEntryCSN it received, and if it's older 
than what's in the queue, then we do a full replication.

It may seems costly, but it's unlikely that a consumer get disconnected 
for a long period of time. All in all, it's like if we just added a 
brand new consumer, with nothing in it.

One option would be to ask the consumer to send a periodic message to 
the producer informing it that it's up to date. It could be a daily 
unbind/bind for instance. The unbind will kill the pending persistent 
search we established between the producer and consumer, to establish a 
new one. As we will send a new request, with the lastEntryCSN, we will 
be able to truncate the provider queue, so it won't grow forever.

We will probably work around this idea with Kiran this week. I'm 
positive that it can work well by the end of this week, or even earlier.

Stay tuned !

-- 
Regards,
Cordialement,
Emmanuel L├ęcharny
www.iktek.com


Mime
View raw message