Hi Jorn,
Thanks for reaching out to us, this is a very important exercise to make sure the upgrade
path works as expected.
- Please do an `ls -al` in your data dir to make sure you have valid snapshot files.
- It would be also useful to expose the Admin port (8080/tcp by default) and check the output
of `lastSnapshotCommand`.
Regards,
Andor
> On 2019. Aug 14., at 7:13, Jörn Franke <jornfranke@gmail.com> wrote:
>
> For me the issue occurred only in standalone mode. With the ensemble I simply cleared
the data directory and it received the zookeeper data from the quorum.
>
>> Am 13.08.2019 um 15:42 schrieb Koen De Groote <koen.degroote@limecraft.com>:
>>
>> I would also like to know if this is possible.
>>
>> From going over the github page, it seems there is a JMX method to force
>> the creation of a snapshot. Yet the docker image is configured as such that
>> a port will never be assigned to the JMX process.
>>
>> Is there any way to bypass this?
>>
>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfranke@gmail.com> wrote:
>>>
>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>> check I think the snapshot count is set to 1 in the cfg
>>>
>>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolivelli@gmail.com>:
>>>>
>>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>> jornfranke@gmail.com>
>>>> ha scritto:
>>>>
>>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>>> it
>>>>> is missing then I wonder why it was missing. There was no crash or
>>> whatever
>>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>>> from the log files. However, then I wonder why it does not create one.
>>>>>
>>>>
>>>>
>>>>
>>>> I remember now that some other user, I think Sijie, reported a similar
>>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>>> 3.5
>>>> if no snapshot is present.
>>>> IIRC The fix was to force the creation of at least one snapshot file and
>>>> then upgrade
>>>>
>>>> Enrico
>>>>
>>>>
>>>>>
>>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org>
wrote:
>>>>>
>>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>>>
>>>>>> If there are local snapshot files and the files are valid, then it's
a
>>>>> bug
>>>>>> that server fails to load them.
>>>>>>
>>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>
>>>>>> Not I am aware of. There are some format changes (added compression
>>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>>> despite
>>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full
history
>>>>>>> behind it).The logs are fine (it works in 3.4.14 without issues,
even
>>>>>> after
>>>>>>> downgrading back). There is no issue with disk space and there
are no
>>> 0
>>>>>>> byte files. I just wonder why it does not find a valid snapshot.
Is
>>> it
>>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>>
>>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org>
wrote:
>>>>>>>
>>>>>>>>>> java.io.IOException: No snapshot found, but there
are log entries.
>>>>>>>> Something is broken!
>>>>>>>>
>>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We
don't want
>>>>>> to
>>>>>>>> end up with potential inconsistent state across the ensemble
when
>>>>>>>> recovering from empty snapshot.
>>>>>>>>
>>>>>>>> To continue upgrade, just delete all txn log files and let
the node
>>>>>> sync
>>>>>>>> the snapshot from the quorum.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com>
ha
>>>>>>> scritto:
>>>>>>>>>
>>>>>>>>>> It also seems that 3.5.5 does not attempt to read
all of the
>>>>>> logfiles
>>>>>>>> (I
>>>>>>>>>> have to still confirm), but the two it reads exist,
it has access
>>>>>> and
>>>>>>>>> they
>>>>>>>>>> are much more than 0 byte
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>>>
>>>>>>>>> Anyone on this list has a better idea?
>>>>>>>>>
>>>>>>>>> Enrico
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>>>>> jornfranke@gmail.com
>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> (of course i do not run them at the same time)
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke
<
>>>>>> jornfranke@gmail.com
>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> thank you for the quick reply. They read
from the same disk
>>>>>> paths
>>>>>>>> and
>>>>>>>>>>>> have the same access rights (in fact the
RHEL service executes
>>>>>>> them
>>>>>>>> as
>>>>>>>>>> the
>>>>>>>>>>>> same specific user).
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli
<
>>>>>>>> eolivelli@gmail.com
>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke
<jornfranke@gmail.com>
>>>>>> ha
>>>>>>>>>> scritto:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I tried to migrate a lab environment
from Zookeepr 3.4.14
>>>>>> (used
>>>>>>>> for
>>>>>>>>>>>>> Solr)
>>>>>>>>>>>>>> to 3.5.5 and encountered an issue.
It is ZooKeeper in
>>>>>>> standalone
>>>>>>>>> mode
>>>>>>>>>>>>>> (other environments have a proper
ensemble). I increased
>>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>>> beyond the default (but not excessively)
- this was working
>>>>>>>>> perfectly
>>>>>>>>>>>>> fine
>>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Basically I reuse for the migration
the same config files,
>>>>>>> except
>>>>>>>>>> that
>>>>>>>>>>>>> I
>>>>>>>>>>>>>> whitelist some commands (later I
am also interested in
>>>>> adding
>>>>>>>> SSL).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have the following error message
when starting Zookeeper
>>>>>> with
>>>>>>>>> 3.5.5
>>>>>>>>>>>>>> (basically, I just changed the symboling
link from
>>>>> zookeeper
>>>>>> to
>>>>>>>>> point
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] -
DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] -
ERROR
>>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>>> ]
>>>>>>>>>>>>> -
>>>>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>>>>> java.io.IOException: No snapshot
found, but there are log
>>>>>>>> entries.
>>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Strangely enough, if I switch back
to 3.4.14 the issue is
>>>>>>>> resolved
>>>>>>>>>> and
>>>>>>>>>>>>>> Zookeeper works normally. However,
I would like to leverage
>>>>>> the
>>>>>>>> new
>>>>>>>>>>>>> version
>>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are no 0 bytes files. Disk
space is plenty available.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you compare these logs with logs
of 3.4.x ? Are they
>>>>>> reading
>>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any idea beyond erasing the data
dir (I would try to avoid
>>>>>> it,
>>>>>>> I
>>>>>>>>> can
>>>>>>>>>>>>>> reconstruct it, but still)? I will
try also in the other
>>>>>>>>>> environments
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> also with an environment with an
ensemble, but i would like
>>>>>> to
>>>>>>>> know
>>>>>>>>>>>>> before
>>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>>> Activated Kerberos Authentication
and Kerberos SSL for
>>>>>> clients
>>>>>>>> and
>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Quorum? In standalone mode there is no
'quorum' auth
>>>>>>>>>>>>>
>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
|