zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5
Date Tue, 13 Aug 2019 20:59:13 GMT
Il mar 13 ago 2019, 15:43 Koen De Groote <koen.degroote@limecraft.com> ha
scritto:

> I would also like to know if this is possible.
>
> From going over the github page, it seems there is a JMX method to force
> the creation of a snapshot. Yet the docker image is configured as such that
> a port will never be assigned to the JMX process.
>

Can't you modify your docker image in order to expose the JMX API? I am not
a docket expert but it should be possible

Enrico


> Is there any way to bypass this?
>
> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfranke@gmail.com> wrote:
>
> > Thanks. It is possible to force Zookeeper to create a snapshot? I will
> > check I think the snapshot count is set to 1 in the cfg
> >
> > > Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolivelli@gmail.com>:
> > >
> > > Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
> > jornfranke@gmail.com>
> > > ha scritto:
> > >
> > >> ok, then let me verify tomorrow if a snapshot file is indeed there. If
> > it
> > >> is missing then I wonder why it was missing. There was no crash or
> > whatever
> > >> and 3.4.14 works without issue, but of course it could have loaded
> them
> > >> from the log files. However, then I wonder why it does not create one.
> > >>
> > >
> > >
> > >
> > > I remember now that some other user, I think Sijie, reported a similar
> > > problem some month ago, that it is not possible to upgrade from 3.4 to
> > 3.5
> > > if no snapshot is present.
> > > IIRC The fix was to force the creation of at least one snapshot file
> and
> > > then upgrade
> > >
> > > Enrico
> > >
> > >
> > >>
> > >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org> wrote:
> > >>
> > >>>>> I just wonder why it does not find a valid snapshot.
> > >>>
> > >>> If there are local snapshot files and the files are valid, then it's
> a
> > >> bug
> > >>> that server fails to load them.
> > >>>
> > >>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
> > >>>
> > >>> Not I am aware of. There are some format changes (added compression
> > >>> support) in master branch, but that's not shipped with 3.5.5.
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com>
> > >> wrote:
> > >>>
> > >>>> ok, then it affects basically all standalone nodes? This is fine,
> > >> despite
> > >>>> that it means some extra work (for uncritical lab environments).
> > >>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full
> history
> > >>>> behind it).The logs are fine (it works in 3.4.14 without issues,
> even
> > >>> after
> > >>>> downgrading back). There is no issue with disk space and there
are
> no
> > 0
> > >>>> byte files.  I just wonder why it does not find a valid snapshot.
Is
> > it
> > >>>> because the format changed in 3.5.5 compared to 3.4.14?
> > >>>>
> > >>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org>
> wrote:
> > >>>>
> > >>>>>>> java.io.IOException: No snapshot found, but there are
log
> entries.
> > >>>>> Something is broken!
> > >>>>>
> > >>>>> This is expected behavior introduced in ZOOKEEPER-2325. We
don't
> want
> > >>> to
> > >>>>> end up with potential inconsistent state across the ensemble
when
> > >>>>> recovering from empty snapshot.
> > >>>>>
> > >>>>> To continue upgrade, just delete all txn log files and let
the node
> > >>> sync
> > >>>>> the snapshot from the quorum.
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <
> eolivelli@gmail.com
> > >>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com>
ha
> > >>>> scritto:
> > >>>>>>
> > >>>>>>> It also seems that 3.5.5 does not attempt to read all
of the
> > >>> logfiles
> > >>>>> (I
> > >>>>>>> have to still confirm), but the two it reads exist,
it has access
> > >>> and
> > >>>>>> they
> > >>>>>>> are much more than 0 byte
> > >>>>>>>
> > >>>>>>
> > >>>>>> We should have the stackstace of the EOFException.
> > >>>>>>
> > >>>>>> Anyone on this list has a better idea?
> > >>>>>>
> > >>>>>> Enrico
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> > >> jornfranke@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> (of course i do not run them at the same time)
> > >>>>>>>>
> > >>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> > >>> jornfranke@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> thank you for the quick reply. They read from
the same disk
> > >>> paths
> > >>>>> and
> > >>>>>>>>> have the same access rights (in fact the RHEL
service executes
> > >>>> them
> > >>>>> as
> > >>>>>>> the
> > >>>>>>>>> same specific user).
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli
<
> > >>>>> eolivelli@gmail.com
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke
<jornfranke@gmail.com>
> > >>> ha
> > >>>>>>> scritto:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I tried to migrate a lab environment
from Zookeepr 3.4.14
> > >>> (used
> > >>>>> for
> > >>>>>>>>>> Solr)
> > >>>>>>>>>>> to 3.5.5 and encountered an issue.
It is ZooKeeper in
> > >>>> standalone
> > >>>>>> mode
> > >>>>>>>>>>> (other environments have a proper ensemble).
I increased
> > >>>>>>> jute.maxbuffer
> > >>>>>>>>>>> beyond the default (but not excessively)
- this was working
> > >>>>>> perfectly
> > >>>>>>>>>> fine
> > >>>>>>>>>>> in 3.4.14.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Basically I reuse for the migration
the same config files,
> > >>>> except
> > >>>>>>> that
> > >>>>>>>>>> I
> > >>>>>>>>>>> whitelist some commands (later I am
also interested in
> > >> adding
> > >>>>> SSL).
> > >>>>>>>>>>>
> > >>>>>>>>>>> I have the following error message
when starting Zookeeper
> > >>> with
> > >>>>>> 3.5.5
> > >>>>>>>>>>> (basically, I just changed the symboling
link from
> > >> zookeeper
> > >>> to
> > >>>>>> point
> > >>>>>>>>>> to
> > >>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
> > >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> > >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
> > >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> > >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
> > >>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> > >>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
> > >>>>>>>>>>> /zookeeper/version-2/log.b34
> > >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> > >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
> > >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> > >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
> > >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> > >>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
> > >>>>>>>>>>> /zookeeper/version-2/log.b72
> > >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
> > >>>>>> [main:ZooKeeperServerMain@83
> > >>>>>>> ]
> > >>>>>>>>>> -
> > >>>>>>>>>>> Unexpected exception, exiting abnormally
> > >>>>>>>>>>> java.io.IOException: No snapshot found,
but there are log
> > >>>>> entries.
> > >>>>>>>>>>> Something is broken!
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Strangely enough, if I switch back
to 3.4.14 the issue is
> > >>>>> resolved
> > >>>>>>> and
> > >>>>>>>>>>> Zookeeper works normally. However,
I would like to leverage
> > >>> the
> > >>>>> new
> > >>>>>>>>>> version
> > >>>>>>>>>>> 3.5.5.
> > >>>>>>>>>>>
> > >>>>>>>>>>> There are no 0 bytes files. Disk space
is plenty available.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Can you compare these logs with  logs of
3.4.x ? Are they
> > >>> reading
> > >>>>>> from
> > >>>>>>>>>> the
> > >>>>>>>>>> same disk paths?
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> Any idea beyond erasing the data dir
(I would try to avoid
> > >>> it,
> > >>>> I
> > >>>>>> can
> > >>>>>>>>>>> reconstruct it, but still)?  I will
try also in the other
> > >>>>>>> environments
> > >>>>>>>>>> and
> > >>>>>>>>>>> also with an environment with an ensemble,
but i would like
> > >>> to
> > >>>>> know
> > >>>>>>>>>> before
> > >>>>>>>>>>> what the issue could be.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Not sure if it is relevant, but:
> > >>>>>>>>>>> Activated Kerberos Authentication and
Kerberos SSL for
> > >>> clients
> > >>>>> and
> > >>>>>>>>>> quorum.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Quorum? In standalone mode there is no
'quorum' auth
> > >>>>>>>>>>
> > >>>>>>>>>> Enrico
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message