lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Avoid losing data on ZK connection-loss/session-timeout
Date Thu, 30 Aug 2012 20:01:01 GMT
> Ad 2) When the "behind" node has reconnected and become leader and the one
> with the latest updates does not come back live right away, isnt the new
> leader (which is behind) allowed to start handling update-requests. If yes,
> then it will be possible that both shards have documents/updates that the
> other one doesnt, and it is possible to come up with scenarios where there
> is no good algorithm for generating the "correct" merged union of the data
> in both shards. So what to do when the other shard (which used to have a
> later version than the current leader) comes live?
> 3) Believe there is nothing solid to do!
> How to avoid that? I was thinking about keeping the latest version for every
> slice in ZK, so that a "behind" shard will know if it has the latest version
> of a slice, and therefore if it is allowed to take the role as leader. Of
> course the writing of this "latest version" to ZK and the writing of the
> corresponding update in leaders transaction-log would have to be atomic
> (like the A in ACID) as much as possible. And it would be nice if writing of
> the update in replica transaction-log would also be atomic with the
> leader-writing and the ZK writing, in order to increase the chance that a
> replica is actually allowed to take over the leader role if the leader dies
> (or both dies and replica comes back first, and "old" leader comes back
> minutes later). But all that is just an idea on top of my head.
> Do you already have a solution implemented or a solution on the drawing
> board or how do you/we prevent such a problem? As far as I understand "the
> drill" during leader-election/recovery (whether its peer-sync or
> file-copy-replication) from the little code-reading I have done and from
> what you explain, there is not a current solution. But I might be wrong?

FWIW, I have added some logic so that we will wait to initiate leader
election until all the nodes are back or n amount of time goes by. I'd
like to make that configurable.

To back up though, no, I don't currently think any of this is a
problem currently anyway.

The way you would get updates on different nodes is that the leader
goes down in the middle of the update. If that happens, you don't know
if the update succeeded or not from the client. We don't return
success until every replica has the update. So the sync up stage
simply ensures that the shard stays consistent. Either an update that
you didn't know the success of will make it into the sync, or it
won't. Either way, since the client could not get a response, is fine
in my opinion. When you don't get a response, it's an open question

Once we relax requiring the update gets to all replicas, then this may
be something I was more concerned about.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message