zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Nixon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file
Date Fri, 08 Jun 2018 23:18:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506687#comment-16506687

Brian Nixon commented on ZOOKEEPER-3056:

[~mmerli] That's a very reasonable concern and I'd ideally have all upgrades be seamless in
exactly the way you describe. Property gating the validation is only undesirable from a proliferation
of config point of view.

[~hanm] I think the signal file is a very workable approach and pretty straightforward to
implement. The first intervention that I scoped out (create a snapshot.0) was inspired by
yours as it simplifies the path of "signal file" to "database load with trust in the transaction
log" to "create snapshot, delete signal file". -- It's a trade-off between admin time and
server side code complexity for sure.

In order of decreasing seamlessness/admin time:
 * property flag snapshot validation (default off)
 * property flag snapshot validation (default on)
 * signal file
 * admin script to create a snapshot.0 file in the snapshot directory
 * upgrade notes to create a snapshot.0 file in the snapshot directory

For the use cases that we maintain, it's far more likely that being unable to load a snapshot
indicates corruption or machine malfeasance than a legitimate database so I'd like to expand
that impression with more information from the community. Is a snapshot-less db expected/unremarkable
under some reasonable workloads or is it something worth (politely) discouraging? I do believe
ZOOKEEPER-2325 is a good feature and it would be a shame to set it off by default.

> Fails to load database with missing snapshot file but valid transaction log file
> --------------------------------------------------------------------------------
>                 Key: ZOOKEEPER-3056
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.3, 3.5.4
>            Reporter: Michael Han
>            Priority: Critical
> [An issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f1472cccc1a93b4b@%3Cdev.zookeeper.apache.org%3E]
was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing snapshot file.
> The code complains about missing snapshot file is [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206]
which is introduced as part of ZOOKEEPER-2325.
> With this check, ZK will not load the db without a snapshot file, even the transaction
log files are present and valid. This could be a problem for restoring a ZK instance which
does not have a snapshot file but have a sound state (e.g. it crashes before being able to
take the first snap shot with a large snapCount parameter configured).

This message was sent by Atlassian JIRA

View raw message