Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@zookeeper.apache.org
Date: Wed, 6 Jul 2011 02:58:16 +0000 (UTC)
From: "Kurt Young (JIRA)" <jira@apache.org>
To: dev@zookeeper.apache.org
Message-ID: 
 <1290103517.2510.1309921096651.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Created] (ZOOKEEPER-1118) Inconsistent data after server
 crashes several times
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Inconsistent data after server crashes several times
----------------------------------------------------

                 Key: ZOOKEEPER-1118
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1118
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum
    Affects Versions: 3.3.2
         Environment: Redhat RHEL5
            Reporter: Kurt Young
            Priority: Critical


I think there is a bug when Follower try to sync data with Leader.
Assume there are some operations committed during one server had been crash=
ed. When the server restart, it will receive a NEWLEADER packet which inclu=
de the last zxid of leader and the server will set its own lastProcessZxid =
to the leader's.=20
{code:title=3DFollower.java|borderStyle=3Dsolid}
void followLeader() throws InterruptedException {
    fzk.registerJMX(new FollowerBean(this, zk), self.jmxLocalPeerBean);
    try {
        InetSocketAddress addr =3D findLeader();
        try {
            connectToLeader(addr);
            long newLeaderZxid =3D registerWithLeader(Leader.FOLLOWERINFO);=
  // get the last zxid from leader
            //check to see if the leader zxid is lower than ours           =
                                                                           =
   =20
            //this should never happen but is just a safety check          =
                                                                           =
   =20
            long lastLoggedZxid =3D self.getLastLoggedZxid();
            if ((newLeaderZxid >> 32L) < (lastLoggedZxid >> 32L)) {
                LOG.fatal("Leader epoch " + Long.toHexString(newLeaderZxid =
>> 32L)
                        + " is less than our epoch " + Long.toHexString(las=
tLoggedZxid >> 32L));
                throw new IOException("Error: Epoch of leader is lower");
            }
            syncWithLeader(newLeaderZxid);   // set its own lastProcessZxid=
 to leader's last zxid
{code}

Then, some COMMIT packets will be received by the server in order to sync t=
he data with leader. And then, the leader will send an UPTODATE packet to s=
erver to take a snapshot.=20
{code:title=3DFollower.java|borderStyle=3Dsolid}
protected void processPacket(QuorumPacket qp) throws IOException{
    switch (qp.getType()) {
    case Leader.PING:
        ping(qp);
        break;
    case Leader.PROPOSAL:
        TxnHeader hdr =3D new TxnHeader();
        BinaryInputArchive ia =3D BinaryInputArchive
        .getArchive(new ByteArrayInputStream(qp.getData()));
        Record txn =3D SerializeUtils.deserializeTxn(ia, hdr);
        if (hdr.getZxid() !=3D lastQueued + 1) {
            LOG.warn("Got zxid 0x"
                    + Long.toHexString(hdr.getZxid())
                    + " expected 0x"
                    + Long.toHexString(lastQueued + 1));
        }
        lastQueued =3D hdr.getZxid();
        fzk.logRequest(hdr, txn);
        break;
    case Leader.COMMIT:
        fzk.commit(qp.getZxid());
        break;
    case Leader.UPTODATE:
        fzk.takeSnapshot();
        self.cnxnFactory.setZooKeeperServer(fzk);
        break;
    case Leader.REVALIDATE:
        revalidate(qp);
        break;
    case Leader.SYNC:
        fzk.sync();
        break;
    }
}
{code}
Notice the different way the Follower treat the COMMIT and the UPTODATE pac=
kets. When receives a COMMIT packet, the follower will give this to a proce=
ssor to deal with. But if receives a UPTODATE packet, the follower will tak=
e a snapshot immediately. So it is possible that the server will take snaps=
hot before it commits all the operations it missed. Then if the server cras=
hed again and recovered=EF=BC=8C it will recover its data from the snapshot=
, so the date inconsistent with the leader now, but its last zxid is the sa=
me.=20

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira