zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: ZK recovery questions
Date Mon, 19 Jul 2010 00:25:24 GMT
On Sun, Jul 18, 2010 at 3:34 PM, Ashwin Jayaprakash <
ashwin.jayaprakash@gmail.com> wrote:

>   - If 1 out of 3 servers crashes and the log files are unrecoverable, how
>   do we provision a replacement server?

Just start it and it will download a snapshot from the other servers.

>    - If the server log is recoverable but provisioning takes a long time,
>   then what happens if the old log file is far behind the current state?

If a server is very far behind, it will download a snapshot as if it knows
nothing.  This rarely takes long.

>      - If there was a temporary glitch (n/w or GC) and the replica to which
>      the client is connected breaks away from the quorum does the client
> get
>      notified? Does it stop processing client requests? Does it rejoin the
>      cluster without manual intervention?

Failures like this are normally invisible to the client.

>   - Do the servers really have to run with file based persistence? I saw
>   that someone wanted this in-memory mode for unit testing (ZK
> 694<https://issues.apache.org/jira/browse/ZOOKEEPER-694>)
>   but there are cases where only a transient ZK service is needed. Most
>   enterprise systems have replicated Databases anyway. So, the fear of data
>   loss is minimal. If ZK logs are the only means of recovery, then this
> might
>   be harder to implement

ZK is not a replacement for your database and it is really, really nice to
be able to stop it and start it again.  Disk persistence helps with this

  promising. Plain ZK API is a bit overwhelming :)

In practice, it is really pretty simple.  Try it out.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message