zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5
Date Tue, 30 Jul 2019 06:06:01 GMT
Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <jornfranke@gmail.com>
ha scritto:

> ok, then let me verify tomorrow if a snapshot file is indeed there. If it
> is missing then I wonder why it was missing. There was no crash or whatever
> and 3.4.14 works without issue, but of course it could have loaded them
> from the log files. However, then I wonder why it does not create one.
>



I remember now that some other user, I think Sijie, reported a similar
problem some month ago, that it is not possible to upgrade from 3.4 to 3.5
if no snapshot is present.
IIRC The fix was to force the creation of at least one snapshot file and
then upgrade

Enrico


>
> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org> wrote:
>
> > >> I just wonder why it does not find a valid snapshot.
> >
> > If there are local snapshot files and the files are valid, then it's a
> bug
> > that server fails to load them.
> >
> > >> Is it because the format changed in 3.5.5 compared to 3.4.14?
> >
> > Not I am aware of. There are some format changes (added compression
> > support) in master branch, but that's not shipped with 3.5.5.
> >
> >
> >
> > On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com>
> wrote:
> >
> > > ok, then it affects basically all standalone nodes? This is fine,
> despite
> > > that it means some extra work (for uncritical lab environments).
> > > I am not sure it is ZOOKEEPER-2325, but I don't know the full history
> > > behind it).The logs are fine (it works in 3.4.14 without issues, even
> > after
> > > downgrading back). There is no issue with disk space and there are no 0
> > > byte files.  I just wonder why it does not find a valid snapshot. Is it
> > > because the format changed in 3.5.5 compared to 3.4.14?
> > >
> > > On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org> wrote:
> > >
> > > > >> java.io.IOException: No snapshot found, but there are log entries.
> > > > Something is broken!
> > > >
> > > > This is expected behavior introduced in ZOOKEEPER-2325. We don't want
> > to
> > > > end up with potential inconsistent state across the ensemble when
> > > > recovering from empty snapshot.
> > > >
> > > > To continue upgrade, just delete all txn log files and let the node
> > sync
> > > > the snapshot from the quorum.
> > > >
> > > >
> > > > On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com>
ha
> > > scritto:
> > > > >
> > > > > > It also seems that 3.5.5 does not attempt to read all of the
> > logfiles
> > > > (I
> > > > > > have to still confirm), but the two it reads exist, it has access
> > and
> > > > > they
> > > > > > are much more than 0 byte
> > > > > >
> > > > >
> > > > > We should have the stackstace of the EOFException.
> > > > >
> > > > > Anyone on this list has a better idea?
> > > > >
> > > > > Enrico
> > > > >
> > > > >
> > > > > > On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> jornfranke@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > (of course i do not run them at the same time)
> > > > > > >
> > > > > > > On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> > jornfranke@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> thank you for the quick reply. They read from the same
disk
> > paths
> > > > and
> > > > > > >> have the same access rights (in fact the RHEL service
executes
> > > them
> > > > as
> > > > > > the
> > > > > > >> same specific user).
> > > > > > >>
> > > > > > >> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
> > > > eolivelli@gmail.com
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> Il lun 29 lug 2019, 21:50 Jörn Franke <jornfranke@gmail.com>
> > ha
> > > > > > scritto:
> > > > > > >>>
> > > > > > >>> > Hi,
> > > > > > >>> >
> > > > > > >>> > I tried to migrate a lab environment from
Zookeepr 3.4.14
> > (used
> > > > for
> > > > > > >>> Solr)
> > > > > > >>> > to 3.5.5 and encountered an issue. It is ZooKeeper
in
> > > standalone
> > > > > mode
> > > > > > >>> > (other environments have a proper ensemble).
I increased
> > > > > > jute.maxbuffer
> > > > > > >>> > beyond the default (but not excessively) -
this was working
> > > > > perfectly
> > > > > > >>> fine
> > > > > > >>> > in 3.4.14.
> > > > > > >>> >
> > > > > > >>> > Basically I reuse for the migration the same
config files,
> > > except
> > > > > > that
> > > > > > >>> I
> > > > > > >>> > whitelist some commands (later I am also interested
in
> adding
> > > > SSL).
> > > > > > >>> >
> > > > > > >>> > I have the following error message when starting
Zookeeper
> > with
> > > > > 3.5.5
> > > > > > >>> > (basically, I just changed the symboling link
from
> zookeeper
> > to
> > > > > point
> > > > > > >>> to
> > > > > > >>> > 3.5.5 instead of the 3.4.14 directory:
> > > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@655]
> > > > > > >>> > - Created new input stream /zookeeper/version-2/log.b34
> > > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@658]
> > > > > > >>> > - Created new input archive /zookeeper/version-2/log.b34
> > > > > > >>> > 2019-07-29 15:16:25,222 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@696]
> > > > > > >>> > - EOF exception java.io.EOFException: Failed
to read
> > > > > > >>> > /zookeeper/version-2/log.b34
> > > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@655]
> > > > > > >>> > - Created new input stream /zookeeper/version-2/log.b72
> > > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@658]
> > > > > > >>> > - Created new input archive /zookeeper/version-2/log.b72
> > > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@696]
> > > > > > >>> > - EOF exception java.io.EOFException: Failed
to read
> > > > > > >>> > /zookeeper/version-2/log.b72
> > > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - ERROR
> > > > > [main:ZooKeeperServerMain@83
> > > > > > ]
> > > > > > >>> -
> > > > > > >>> > Unexpected exception, exiting abnormally
> > > > > > >>> > java.io.IOException: No snapshot found, but
there are log
> > > > entries.
> > > > > > >>> > Something is broken!
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > >
> > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> > > > > > >>> >
> > > > > > >>> > Strangely enough, if I switch back to 3.4.14
the issue is
> > > > resolved
> > > > > > and
> > > > > > >>> > Zookeeper works normally. However, I would
like to leverage
> > the
> > > > new
> > > > > > >>> version
> > > > > > >>> > 3.5.5.
> > > > > > >>> >
> > > > > > >>> > There are no 0 bytes files. Disk space is
plenty available.
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Can you compare these logs with  logs of 3.4.x
? Are they
> > reading
> > > > > from
> > > > > > >>> the
> > > > > > >>> same disk paths?
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> > Any idea beyond erasing the data dir (I would
try to avoid
> > it,
> > > I
> > > > > can
> > > > > > >>> > reconstruct it, but still)?  I will try also
in the other
> > > > > > environments
> > > > > > >>> and
> > > > > > >>> > also with an environment with an ensemble,
but i would like
> > to
> > > > know
> > > > > > >>> before
> > > > > > >>> > what the issue could be.
> > > > > > >>> >
> > > > > > >>> > Not sure if it is relevant, but:
> > > > > > >>> > Activated Kerberos Authentication and Kerberos
SSL for
> > clients
> > > > and
> > > > > > >>> quorum.
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>> Quorum? In standalone mode there is no 'quorum'
auth
> > > > > > >>>
> > > > > > >>> Enrico
> > > > > > >>>
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message