hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishal K (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-872) Small fixes to PurgeTxnLog
Date Mon, 01 Nov 2010 20:12:26 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927127#action_12927127
] 

Vishal K commented on ZOOKEEPER-872:
------------------------------------

Hi Pat,

The current code prints to stdout.  We have a RMI service that has ZK server embedded in it.
We do this so that we can run/start/stop ZK across platforms without having to write platform
specific scripts. In this server, we  start a thread that periodically calls PurgeTxnlog.purge().
As you pointed out, we should have a -q flag to direct to log instead stdout to statisfy both
the approaches. I will make that change.

We chose number 2 here because we think having only one backup will be enough. It is not clear
to us under what conditions the additional backup will be useful.

Backups are useful under the following scenario (correct me if I am wrong):
1. The current ZooKeeper transaction log and/or snapshot is corrupted, but the past snapshots
and transaction logs are ok. Corrupting can mean either disk file corruption or corrupting
of transaction entries in the log. We store ZooKeeper data on mirrored disks.
2. The application itself made some errors that requires reverting back to the older version.

For the first point, having one additional backup would suffice. The second point is really
tricky. I am not sure how the application can decide which snapshot to revert to. I think
in most cases it will be trial and error. It is not clear to me how to estimate the number
of backups needed. Also, it is not clear how one would go about going back in time. I looked
at LogFormatter utility and that utility does not help much in undoing the erroneous transactions
for case 2 above. In general, I think it is good to enforce users to have a minimum of one
backup.

Related question: Is there hash on the log files (or internal tree structures) that can tell
the ZooKeeper server if the logs are corrupted. If yes, the zookeeper server can verify the
hash during startup and take some action based on that. For example, make sure that it never
becomes a leader until it gets the correct snapshot from the existing leader (otherwise it
may endup corrupting other server's log). "Corrupting" here refers to the case where the file
is readable, but one or more transactions in the log are bad.

I am not sure if there is a test for this. If I remember correctly, there is a bug that causes
the purge() function to leave behind one addition log file. Please refer to my question above
about findNRecentSnapshots(). I can add a test or modify the pruge utlity once we have concluded
this discussion.

> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints
to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message