zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koen De Groote <koen.degro...@limecraft.com>
Subject Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5
Date Tue, 13 Aug 2019 13:42:54 GMT
I would also like to know if this is possible.

>From going over the github page, it seems there is a JMX method to force
the creation of a snapshot. Yet the docker image is configured as such that
a port will never be assigned to the JMX process.

Is there any way to bypass this?

On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfranke@gmail.com> wrote:

> Thanks. It is possible to force Zookeeper to create a snapshot? I will
> check I think the snapshot count is set to 1 in the cfg
>
> > Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolivelli@gmail.com>:
> >
> > Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
> jornfranke@gmail.com>
> > ha scritto:
> >
> >> ok, then let me verify tomorrow if a snapshot file is indeed there. If
> it
> >> is missing then I wonder why it was missing. There was no crash or
> whatever
> >> and 3.4.14 works without issue, but of course it could have loaded them
> >> from the log files. However, then I wonder why it does not create one.
> >>
> >
> >
> >
> > I remember now that some other user, I think Sijie, reported a similar
> > problem some month ago, that it is not possible to upgrade from 3.4 to
> 3.5
> > if no snapshot is present.
> > IIRC The fix was to force the creation of at least one snapshot file and
> > then upgrade
> >
> > Enrico
> >
> >
> >>
> >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org> wrote:
> >>
> >>>>> I just wonder why it does not find a valid snapshot.
> >>>
> >>> If there are local snapshot files and the files are valid, then it's a
> >> bug
> >>> that server fails to load them.
> >>>
> >>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
> >>>
> >>> Not I am aware of. There are some format changes (added compression
> >>> support) in master branch, but that's not shipped with 3.5.5.
> >>>
> >>>
> >>>
> >>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com>
> >> wrote:
> >>>
> >>>> ok, then it affects basically all standalone nodes? This is fine,
> >> despite
> >>>> that it means some extra work (for uncritical lab environments).
> >>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
> >>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
> >>> after
> >>>> downgrading back). There is no issue with disk space and there are no
> 0
> >>>> byte files.  I just wonder why it does not find a valid snapshot. Is
> it
> >>>> because the format changed in 3.5.5 compared to 3.4.14?
> >>>>
> >>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org>
wrote:
> >>>>
> >>>>>>> java.io.IOException: No snapshot found, but there are log
entries.
> >>>>> Something is broken!
> >>>>>
> >>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't
want
> >>> to
> >>>>> end up with potential inconsistent state across the ensemble when
> >>>>> recovering from empty snapshot.
> >>>>>
> >>>>> To continue upgrade, just delete all txn log files and let the node
> >>> sync
> >>>>> the snapshot from the quorum.
> >>>>>
> >>>>>
> >>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com>
ha
> >>>> scritto:
> >>>>>>
> >>>>>>> It also seems that 3.5.5 does not attempt to read all of
the
> >>> logfiles
> >>>>> (I
> >>>>>>> have to still confirm), but the two it reads exist, it has
access
> >>> and
> >>>>>> they
> >>>>>>> are much more than 0 byte
> >>>>>>>
> >>>>>>
> >>>>>> We should have the stackstace of the EOFException.
> >>>>>>
> >>>>>> Anyone on this list has a better idea?
> >>>>>>
> >>>>>> Enrico
> >>>>>>
> >>>>>>
> >>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> >> jornfranke@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> (of course i do not run them at the same time)
> >>>>>>>>
> >>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> >>> jornfranke@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> thank you for the quick reply. They read from the
same disk
> >>> paths
> >>>>> and
> >>>>>>>>> have the same access rights (in fact the RHEL service
executes
> >>>> them
> >>>>> as
> >>>>>>> the
> >>>>>>>>> same specific user).
> >>>>>>>>>
> >>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli
<
> >>>>> eolivelli@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jornfranke@gmail.com>
> >>> ha
> >>>>>>> scritto:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I tried to migrate a lab environment from
Zookeepr 3.4.14
> >>> (used
> >>>>> for
> >>>>>>>>>> Solr)
> >>>>>>>>>>> to 3.5.5 and encountered an issue. It is
ZooKeeper in
> >>>> standalone
> >>>>>> mode
> >>>>>>>>>>> (other environments have a proper ensemble).
I increased
> >>>>>>> jute.maxbuffer
> >>>>>>>>>>> beyond the default (but not excessively)
- this was working
> >>>>>> perfectly
> >>>>>>>>>> fine
> >>>>>>>>>>> in 3.4.14.
> >>>>>>>>>>>
> >>>>>>>>>>> Basically I reuse for the migration the
same config files,
> >>>> except
> >>>>>>> that
> >>>>>>>>>> I
> >>>>>>>>>>> whitelist some commands (later I am also
interested in
> >> adding
> >>>>> SSL).
> >>>>>>>>>>>
> >>>>>>>>>>> I have the following error message when
starting Zookeeper
> >>> with
> >>>>>> 3.5.5
> >>>>>>>>>>> (basically, I just changed the symboling
link from
> >> zookeeper
> >>> to
> >>>>>> point
> >>>>>>>>>> to
> >>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
> >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
> >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
> >>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> >>>>>>>>>>> - EOF exception java.io.EOFException: Failed
to read
> >>>>>>>>>>> /zookeeper/version-2/log.b34
> >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
> >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
> >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> >>>>>>>>>>> - EOF exception java.io.EOFException: Failed
to read
> >>>>>>>>>>> /zookeeper/version-2/log.b72
> >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
> >>>>>> [main:ZooKeeperServerMain@83
> >>>>>>> ]
> >>>>>>>>>> -
> >>>>>>>>>>> Unexpected exception, exiting abnormally
> >>>>>>>>>>> java.io.IOException: No snapshot found,
but there are log
> >>>>> entries.
> >>>>>>>>>>> Something is broken!
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> >>>>>>>>>>>
> >>>>>>>>>>> Strangely enough, if I switch back to 3.4.14
the issue is
> >>>>> resolved
> >>>>>>> and
> >>>>>>>>>>> Zookeeper works normally. However, I would
like to leverage
> >>> the
> >>>>> new
> >>>>>>>>>> version
> >>>>>>>>>>> 3.5.5.
> >>>>>>>>>>>
> >>>>>>>>>>> There are no 0 bytes files. Disk space is
plenty available.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Can you compare these logs with  logs of 3.4.x
? Are they
> >>> reading
> >>>>>> from
> >>>>>>>>>> the
> >>>>>>>>>> same disk paths?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Any idea beyond erasing the data dir (I
would try to avoid
> >>> it,
> >>>> I
> >>>>>> can
> >>>>>>>>>>> reconstruct it, but still)?  I will try
also in the other
> >>>>>>> environments
> >>>>>>>>>> and
> >>>>>>>>>>> also with an environment with an ensemble,
but i would like
> >>> to
> >>>>> know
> >>>>>>>>>> before
> >>>>>>>>>>> what the issue could be.
> >>>>>>>>>>>
> >>>>>>>>>>> Not sure if it is relevant, but:
> >>>>>>>>>>> Activated Kerberos Authentication and Kerberos
SSL for
> >>> clients
> >>>>> and
> >>>>>>>>>> quorum.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Quorum? In standalone mode there is no 'quorum'
auth
> >>>>>>>>>>
> >>>>>>>>>> Enrico
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message