zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Detecting data loss
Date Thu, 06 Jun 2013 14:46:11 GMT
A server doesn't know which of the operations in its log were
committed, and so can't say whether truncating the log is something
that resulted from normal operation or from some failure.

Truncating the log is often normal - suppose that A is the leader and
receives 10 operations from some client to propose, but looses
leadership because of temporary network problems. Then B talks with C
and becomes the leader, and neither B nor C know of the new 10
operations, which is ok. Then A reconnects back to B and C and at this
point A's log gets truncated, to match that of B and C.

We could detect a problem if we'd stored the last committed operation
id on disk, so truncating beyond that id is obviously wrong. But that
would not always work, because a server may miss a few last commit
messages (its a quorum protocol). It may be better than nothing


On Thu, May 30, 2013 at 2:57 PM, Dave Katz <dkatz@dkatz.org> wrote:
> Are there any hooks by which the Zookeeper server can signal that it has lost data? 
It seems at least theoretically possible that when a server is reconciling its state with
other servers that it could detect history truncation and signal it (even as crudely as throwing
an exception).  This would provide a mechanism with which an elastic system could do last-ditch
recovery when things fell apart.
> Thanks,
> --Dave

View raw message