zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5
Date Mon, 29 Jul 2019 21:58:57 GMT
ok, then let me verify tomorrow if a snapshot file is indeed there. If it
is missing then I wonder why it was missing. There was no crash or whatever
and 3.4.14 works without issue, but of course it could have loaded them
from the log files. However, then I wonder why it does not create one.

On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org> wrote:

> >> I just wonder why it does not find a valid snapshot.
>
> If there are local snapshot files and the files are valid, then it's a bug
> that server fails to load them.
>
> >> Is it because the format changed in 3.5.5 compared to 3.4.14?
>
> Not I am aware of. There are some format changes (added compression
> support) in master branch, but that's not shipped with 3.5.5.
>
>
>
> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com> wrote:
>
> > ok, then it affects basically all standalone nodes? This is fine, despite
> > that it means some extra work (for uncritical lab environments).
> > I am not sure it is ZOOKEEPER-2325, but I don't know the full history
> > behind it).The logs are fine (it works in 3.4.14 without issues, even
> after
> > downgrading back). There is no issue with disk space and there are no 0
> > byte files.  I just wonder why it does not find a valid snapshot. Is it
> > because the format changed in 3.5.5 compared to 3.4.14?
> >
> > On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org> wrote:
> >
> > > >> java.io.IOException: No snapshot found, but there are log entries.
> > > Something is broken!
> > >
> > > This is expected behavior introduced in ZOOKEEPER-2325. We don't want
> to
> > > end up with potential inconsistent state across the ensemble when
> > > recovering from empty snapshot.
> > >
> > > To continue upgrade, just delete all txn log files and let the node
> sync
> > > the snapshot from the quorum.
> > >
> > >
> > > On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com>
> > > wrote:
> > >
> > > > Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com> ha
> > scritto:
> > > >
> > > > > It also seems that 3.5.5 does not attempt to read all of the
> logfiles
> > > (I
> > > > > have to still confirm), but the two it reads exist, it has access
> and
> > > > they
> > > > > are much more than 0 byte
> > > > >
> > > >
> > > > We should have the stackstace of the EOFException.
> > > >
> > > > Anyone on this list has a better idea?
> > > >
> > > > Enrico
> > > >
> > > >
> > > > > On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <jornfranke@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > (of course i do not run them at the same time)
> > > > > >
> > > > > > On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> jornfranke@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> thank you for the quick reply. They read from the same disk
> paths
> > > and
> > > > > >> have the same access rights (in fact the RHEL service executes
> > them
> > > as
> > > > > the
> > > > > >> same specific user).
> > > > > >>
> > > > > >> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Il lun 29 lug 2019, 21:50 Jörn Franke <jornfranke@gmail.com>
> ha
> > > > > scritto:
> > > > > >>>
> > > > > >>> > Hi,
> > > > > >>> >
> > > > > >>> > I tried to migrate a lab environment from Zookeepr
3.4.14
> (used
> > > for
> > > > > >>> Solr)
> > > > > >>> > to 3.5.5 and encountered an issue. It is ZooKeeper
in
> > standalone
> > > > mode
> > > > > >>> > (other environments have a proper ensemble). I
increased
> > > > > jute.maxbuffer
> > > > > >>> > beyond the default (but not excessively) - this
was working
> > > > perfectly
> > > > > >>> fine
> > > > > >>> > in 3.4.14.
> > > > > >>> >
> > > > > >>> > Basically I reuse for the migration the same config
files,
> > except
> > > > > that
> > > > > >>> I
> > > > > >>> > whitelist some commands (later I am also interested
in adding
> > > SSL).
> > > > > >>> >
> > > > > >>> > I have the following error message when starting
Zookeeper
> with
> > > > 3.5.5
> > > > > >>> > (basically, I just changed the symboling link from
zookeeper
> to
> > > > point
> > > > > >>> to
> > > > > >>> > 3.5.5 instead of the 3.4.14 directory:
> > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > > > > >>> > [main:FileTxnLog$FileTxnIterator@655]
> > > > > >>> > - Created new input stream /zookeeper/version-2/log.b34
> > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > > > > >>> > [main:FileTxnLog$FileTxnIterator@658]
> > > > > >>> > - Created new input archive /zookeeper/version-2/log.b34
> > > > > >>> > 2019-07-29 15:16:25,222 [myid:] - DEBUG
> > > > > >>> > [main:FileTxnLog$FileTxnIterator@696]
> > > > > >>> > - EOF exception java.io.EOFException: Failed to
read
> > > > > >>> > /zookeeper/version-2/log.b34
> > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > > > > >>> > [main:FileTxnLog$FileTxnIterator@655]
> > > > > >>> > - Created new input stream /zookeeper/version-2/log.b72
> > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > > > > >>> > [main:FileTxnLog$FileTxnIterator@658]
> > > > > >>> > - Created new input archive /zookeeper/version-2/log.b72
> > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - DEBUG
> > > > > >>> > [main:FileTxnLog$FileTxnIterator@696]
> > > > > >>> > - EOF exception java.io.EOFException: Failed to
read
> > > > > >>> > /zookeeper/version-2/log.b72
> > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - ERROR
> > > > [main:ZooKeeperServerMain@83
> > > > > ]
> > > > > >>> -
> > > > > >>> > Unexpected exception, exiting abnormally
> > > > > >>> > java.io.IOException: No snapshot found, but there
are log
> > > entries.
> > > > > >>> > Something is broken!
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>>
> > > > >
> > >
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
> > > > > >>> >         at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> > > > > >>> >
> > > > > >>> > Strangely enough, if I switch back to 3.4.14 the
issue is
> > > resolved
> > > > > and
> > > > > >>> > Zookeeper works normally. However, I would like
to leverage
> the
> > > new
> > > > > >>> version
> > > > > >>> > 3.5.5.
> > > > > >>> >
> > > > > >>> > There are no 0 bytes files. Disk space is plenty
available.
> > > > > >>> >
> > > > > >>>
> > > > > >>>
> > > > > >>> Can you compare these logs with  logs of 3.4.x ? Are
they
> reading
> > > > from
> > > > > >>> the
> > > > > >>> same disk paths?
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> > Any idea beyond erasing the data dir (I would try
to avoid
> it,
> > I
> > > > can
> > > > > >>> > reconstruct it, but still)?  I will try also in
the other
> > > > > environments
> > > > > >>> and
> > > > > >>> > also with an environment with an ensemble, but
i would like
> to
> > > know
> > > > > >>> before
> > > > > >>> > what the issue could be.
> > > > > >>> >
> > > > > >>> > Not sure if it is relevant, but:
> > > > > >>> > Activated Kerberos Authentication and Kerberos
SSL for
> clients
> > > and
> > > > > >>> quorum.
> > > > > >>> >
> > > > > >>>
> > > > > >>> Quorum? In standalone mode there is no 'quorum' auth
> > > > > >>>
> > > > > >>> Enrico
> > > > > >>>
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message