zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@apache.org>
Subject Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5
Date Wed, 14 Aug 2019 09:23:54 GMT
After some digging it turned out that this is an outstanding issue in 3.4->3.5 upgrade.
I’ve found the following e-mail thread about it:
https://markmail.org/thread/rbhzbro6nszypwwp

…and an open Jira:
https://issues.apache.org/jira/browse/ZOOKEEPER-3056

Unfortunately, patch is still not available, but essentially the solution is to force ZooKeeper
to create snapshot file somehow. Sorry, Admin interface is not available in 3.4, it was my
bad to recommend it.

In the last Jira comment there’s a workaround:
To perform an upgrade (3.4 -> 3.5):
	• download the "snapshot.0" file attached
	• copy it to the versioned directory (e.g. "version-2") within your data directory (parameter
"dataDir" in your config - this is the directory containing the "myid" file for a peer)
	• restart the peer
	• upgrade the peer (this can be combined with the above step if you like)

Would you please give it a try?

Andor




> On 2019. Aug 14., at 10:44, Andor Molnar <andor@apache.org> wrote:
> 
> Hi Jorn,
> 
> Thanks for reaching out to us, this is a very important exercise to make sure the upgrade
path works as expected.
> 
> - Please do an `ls -al` in your data dir to make sure you have valid snapshot files.
> - It would be also useful to expose the Admin port (8080/tcp by default) and check the
output of `lastSnapshotCommand`.
> 
> Regards,
> Andor
> 
> 
> 
> 
> 
>> On 2019. Aug 14., at 7:13, Jörn Franke <jornfranke@gmail.com> wrote:
>> 
>> For me the issue occurred only in standalone mode. With the ensemble I simply cleared
the data directory and it received the zookeeper data from the quorum. 
>> 
>>> Am 13.08.2019 um 15:42 schrieb Koen De Groote <koen.degroote@limecraft.com>:
>>> 
>>> I would also like to know if this is possible.
>>> 
>>> From going over the github page, it seems there is a JMX method to force
>>> the creation of a snapshot. Yet the docker image is configured as such that
>>> a port will never be assigned to the JMX process.
>>> 
>>> Is there any way to bypass this?
>>> 
>>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfranke@gmail.com>
wrote:
>>>> 
>>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>>> check I think the snapshot count is set to 1 in the cfg
>>>> 
>>>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolivelli@gmail.com>:
>>>>> 
>>>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>>> jornfranke@gmail.com>
>>>>> ha scritto:
>>>>> 
>>>>>> ok, then let me verify tomorrow if a snapshot file is indeed there.
If
>>>> it
>>>>>> is missing then I wonder why it was missing. There was no crash or
>>>> whatever
>>>>>> and 3.4.14 works without issue, but of course it could have loaded
them
>>>>>> from the log files. However, then I wonder why it does not create
one.
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I remember now that some other user, I think Sijie, reported a similar
>>>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>>>> 3.5
>>>>> if no snapshot is present.
>>>>> IIRC The fix was to force the creation of at least one snapshot file
and
>>>>> then upgrade
>>>>> 
>>>>> Enrico
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org>
wrote:
>>>>>> 
>>>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>>>> 
>>>>>>> If there are local snapshot files and the files are valid, then
it's a
>>>>>> bug
>>>>>>> that server fails to load them.
>>>>>>> 
>>>>>>>>> Is it because the format changed in 3.5.5 compared to
3.4.14?
>>>>>>> 
>>>>>>> Not I am aware of. There are some format changes (added compression
>>>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> ok, then it affects basically all standalone nodes? This
is fine,
>>>>>> despite
>>>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the
full history
>>>>>>>> behind it).The logs are fine (it works in 3.4.14 without
issues, even
>>>>>>> after
>>>>>>>> downgrading back). There is no issue with disk space and
there are no
>>>> 0
>>>>>>>> byte files.  I just wonder why it does not find a valid snapshot.
Is
>>>> it
>>>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>>> 
>>>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org>
wrote:
>>>>>>>> 
>>>>>>>>>>> java.io.IOException: No snapshot found, but there
are log entries.
>>>>>>>>> Something is broken!
>>>>>>>>> 
>>>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325.
We don't want
>>>>>>> to
>>>>>>>>> end up with potential inconsistent state across the ensemble
when
>>>>>>>>> recovering from empty snapshot.
>>>>>>>>> 
>>>>>>>>> To continue upgrade, just delete all txn log files and
let the node
>>>>>>> sync
>>>>>>>>> the snapshot from the quorum.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com>
ha
>>>>>>>> scritto:
>>>>>>>>>> 
>>>>>>>>>>> It also seems that 3.5.5 does not attempt to
read all of the
>>>>>>> logfiles
>>>>>>>>> (I
>>>>>>>>>>> have to still confirm), but the two it reads
exist, it has access
>>>>>>> and
>>>>>>>>>> they
>>>>>>>>>>> are much more than 0 byte
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>>>> 
>>>>>>>>>> Anyone on this list has a better idea?
>>>>>>>>>> 
>>>>>>>>>> Enrico
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke
<
>>>>>> jornfranke@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> (of course i do not run them at the same
time)
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke
<
>>>>>>> jornfranke@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> thank you for the quick reply. They read
from the same disk
>>>>>>> paths
>>>>>>>>> and
>>>>>>>>>>>>> have the same access rights (in fact
the RHEL service executes
>>>>>>>> them
>>>>>>>>> as
>>>>>>>>>>> the
>>>>>>>>>>>>> same specific user).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico
Olivelli <
>>>>>>>>> eolivelli@gmail.com
>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke
<jornfranke@gmail.com>
>>>>>>> ha
>>>>>>>>>>> scritto:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I tried to migrate a lab environment
from Zookeepr 3.4.14
>>>>>>> (used
>>>>>>>>> for
>>>>>>>>>>>>>> Solr)
>>>>>>>>>>>>>>> to 3.5.5 and encountered an issue.
It is ZooKeeper in
>>>>>>>> standalone
>>>>>>>>>> mode
>>>>>>>>>>>>>>> (other environments have a proper
ensemble). I increased
>>>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>>>> beyond the default (but not excessively)
- this was working
>>>>>>>>>> perfectly
>>>>>>>>>>>>>> fine
>>>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Basically I reuse for the migration
the same config files,
>>>>>>>> except
>>>>>>>>>>> that
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> whitelist some commands (later
I am also interested in
>>>>>> adding
>>>>>>>>> SSL).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have the following error message
when starting Zookeeper
>>>>>>> with
>>>>>>>>>> 3.5.5
>>>>>>>>>>>>>>> (basically, I just changed the
symboling link from
>>>>>> zookeeper
>>>>>>> to
>>>>>>>>>> point
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:]
- DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:]
- DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:]
- DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:]
- DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:]
- DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:]
- DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:]
- ERROR
>>>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>>>> ]
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> Unexpected exception, exiting
abnormally
>>>>>>>>>>>>>>> java.io.IOException: No snapshot
found, but there are log
>>>>>>>>> entries.
>>>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Strangely enough, if I switch
back to 3.4.14 the issue is
>>>>>>>>> resolved
>>>>>>>>>>> and
>>>>>>>>>>>>>>> Zookeeper works normally. However,
I would like to leverage
>>>>>>> the
>>>>>>>>> new
>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> There are no 0 bytes files. Disk
space is plenty available.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you compare these logs with 
logs of 3.4.x ? Are they
>>>>>>> reading
>>>>>>>>>> from
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any idea beyond erasing the data
dir (I would try to avoid
>>>>>>> it,
>>>>>>>> I
>>>>>>>>>> can
>>>>>>>>>>>>>>> reconstruct it, but still)? 
I will try also in the other
>>>>>>>>>>> environments
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> also with an environment with
an ensemble, but i would like
>>>>>>> to
>>>>>>>>> know
>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>>>> Activated Kerberos Authentication
and Kerberos SSL for
>>>>>>> clients
>>>>>>>>> and
>>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Quorum? In standalone mode there
is no 'quorum' auth
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
> 


Mime
View raw message