zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@apache.org>
Subject Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5
Date Wed, 14 Aug 2019 08:44:26 GMT
Hi Jorn,

Thanks for reaching out to us, this is a very important exercise to make sure the upgrade
path works as expected.

- Please do an `ls -al` in your data dir to make sure you have valid snapshot files.
- It would be also useful to expose the Admin port (8080/tcp by default) and check the output
of `lastSnapshotCommand`.

Regards,
Andor





> On 2019. Aug 14., at 7:13, Jörn Franke <jornfranke@gmail.com> wrote:
> 
> For me the issue occurred only in standalone mode. With the ensemble I simply cleared
the data directory and it received the zookeeper data from the quorum. 
> 
>> Am 13.08.2019 um 15:42 schrieb Koen De Groote <koen.degroote@limecraft.com>:
>> 
>> I would also like to know if this is possible.
>> 
>> From going over the github page, it seems there is a JMX method to force
>> the creation of a snapshot. Yet the docker image is configured as such that
>> a port will never be assigned to the JMX process.
>> 
>> Is there any way to bypass this?
>> 
>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfranke@gmail.com> wrote:
>>> 
>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>> check I think the snapshot count is set to 1 in the cfg
>>> 
>>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolivelli@gmail.com>:
>>>> 
>>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>> jornfranke@gmail.com>
>>>> ha scritto:
>>>> 
>>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>>> it
>>>>> is missing then I wonder why it was missing. There was no crash or
>>> whatever
>>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>>> from the log files. However, then I wonder why it does not create one.
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> I remember now that some other user, I think Sijie, reported a similar
>>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>>> 3.5
>>>> if no snapshot is present.
>>>> IIRC The fix was to force the creation of at least one snapshot file and
>>>> then upgrade
>>>> 
>>>> Enrico
>>>> 
>>>> 
>>>>> 
>>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org>
wrote:
>>>>> 
>>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>>> 
>>>>>> If there are local snapshot files and the files are valid, then it's
a
>>>>> bug
>>>>>> that server fails to load them.
>>>>>> 
>>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>>> 
>>>>>> Not I am aware of. There are some format changes (added compression
>>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>>> despite
>>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full
history
>>>>>>> behind it).The logs are fine (it works in 3.4.14 without issues,
even
>>>>>> after
>>>>>>> downgrading back). There is no issue with disk space and there
are no
>>> 0
>>>>>>> byte files.  I just wonder why it does not find a valid snapshot.
Is
>>> it
>>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org>
wrote:
>>>>>>> 
>>>>>>>>>> java.io.IOException: No snapshot found, but there
are log entries.
>>>>>>>> Something is broken!
>>>>>>>> 
>>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We
don't want
>>>>>> to
>>>>>>>> end up with potential inconsistent state across the ensemble
when
>>>>>>>> recovering from empty snapshot.
>>>>>>>> 
>>>>>>>> To continue upgrade, just delete all txn log files and let
the node
>>>>>> sync
>>>>>>>> the snapshot from the quorum.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com>
ha
>>>>>>> scritto:
>>>>>>>>> 
>>>>>>>>>> It also seems that 3.5.5 does not attempt to read
all of the
>>>>>> logfiles
>>>>>>>> (I
>>>>>>>>>> have to still confirm), but the two it reads exist,
it has access
>>>>>> and
>>>>>>>>> they
>>>>>>>>>> are much more than 0 byte
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>>> 
>>>>>>>>> Anyone on this list has a better idea?
>>>>>>>>> 
>>>>>>>>> Enrico
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>>>>> jornfranke@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> (of course i do not run them at the same time)
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke
<
>>>>>> jornfranke@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> thank you for the quick reply. They read
from the same disk
>>>>>> paths
>>>>>>>> and
>>>>>>>>>>>> have the same access rights (in fact the
RHEL service executes
>>>>>>> them
>>>>>>>> as
>>>>>>>>>> the
>>>>>>>>>>>> same specific user).
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli
<
>>>>>>>> eolivelli@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke
<jornfranke@gmail.com>
>>>>>> ha
>>>>>>>>>> scritto:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried to migrate a lab environment
from Zookeepr 3.4.14
>>>>>> (used
>>>>>>>> for
>>>>>>>>>>>>> Solr)
>>>>>>>>>>>>>> to 3.5.5 and encountered an issue.
It is ZooKeeper in
>>>>>>> standalone
>>>>>>>>> mode
>>>>>>>>>>>>>> (other environments have a proper
ensemble). I increased
>>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>>> beyond the default (but not excessively)
- this was working
>>>>>>>>> perfectly
>>>>>>>>>>>>> fine
>>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Basically I reuse for the migration
the same config files,
>>>>>>> except
>>>>>>>>>> that
>>>>>>>>>>>>> I
>>>>>>>>>>>>>> whitelist some commands (later I
am also interested in
>>>>> adding
>>>>>>>> SSL).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have the following error message
when starting Zookeeper
>>>>>> with
>>>>>>>>> 3.5.5
>>>>>>>>>>>>>> (basically, I just changed the symboling
link from
>>>>> zookeeper
>>>>>> to
>>>>>>>>> point
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] -
ERROR
>>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>>> ]
>>>>>>>>>>>>> -
>>>>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>>>>> java.io.IOException: No snapshot
found, but there are log
>>>>>>>> entries.
>>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Strangely enough, if I switch back
to 3.4.14 the issue is
>>>>>>>> resolved
>>>>>>>>>> and
>>>>>>>>>>>>>> Zookeeper works normally. However,
I would like to leverage
>>>>>> the
>>>>>>>> new
>>>>>>>>>>>>> version
>>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are no 0 bytes files. Disk
space is plenty available.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Can you compare these logs with  logs
of 3.4.x ? Are they
>>>>>> reading
>>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any idea beyond erasing the data
dir (I would try to avoid
>>>>>> it,
>>>>>>> I
>>>>>>>>> can
>>>>>>>>>>>>>> reconstruct it, but still)?  I will
try also in the other
>>>>>>>>>> environments
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> also with an environment with an
ensemble, but i would like
>>>>>> to
>>>>>>>> know
>>>>>>>>>>>>> before
>>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>>> Activated Kerberos Authentication
and Kerberos SSL for
>>>>>> clients
>>>>>>>> and
>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Quorum? In standalone mode there is no
'quorum' auth
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 


Mime
View raw message