zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5
Date Wed, 14 Aug 2019 05:13:01 GMT
For me the issue occurred only in standalone mode. With the ensemble I simply cleared the data
directory and it received the zookeeper data from the quorum. 

> Am 13.08.2019 um 15:42 schrieb Koen De Groote <koen.degroote@limecraft.com>:
> 
> I would also like to know if this is possible.
> 
> From going over the github page, it seems there is a JMX method to force
> the creation of a snapshot. Yet the docker image is configured as such that
> a port will never be assigned to the JMX process.
> 
> Is there any way to bypass this?
> 
>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfranke@gmail.com> wrote:
>> 
>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>> check I think the snapshot count is set to 1 in the cfg
>> 
>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolivelli@gmail.com>:
>>> 
>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>> jornfranke@gmail.com>
>>> ha scritto:
>>> 
>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>> it
>>>> is missing then I wonder why it was missing. There was no crash or
>> whatever
>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>> from the log files. However, then I wonder why it does not create one.
>>>> 
>>> 
>>> 
>>> 
>>> I remember now that some other user, I think Sijie, reported a similar
>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>> 3.5
>>> if no snapshot is present.
>>> IIRC The fix was to force the creation of at least one snapshot file and
>>> then upgrade
>>> 
>>> Enrico
>>> 
>>> 
>>>> 
>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <hanm@apache.org> wrote:
>>>> 
>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>> 
>>>>> If there are local snapshot files and the files are valid, then it's
a
>>>> bug
>>>>> that server fails to load them.
>>>>> 
>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>> 
>>>>> Not I am aware of. There are some format changes (added compression
>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfranke@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>> despite
>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>>>>>> behind it).The logs are fine (it works in 3.4.14 without issues,
even
>>>>> after
>>>>>> downgrading back). There is no issue with disk space and there are
no
>> 0
>>>>>> byte files.  I just wonder why it does not find a valid snapshot.
Is
>> it
>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <hanm@apache.org>
wrote:
>>>>>> 
>>>>>>>>> java.io.IOException: No snapshot found, but there are
log entries.
>>>>>>> Something is broken!
>>>>>>> 
>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't
want
>>>>> to
>>>>>>> end up with potential inconsistent state across the ensemble
when
>>>>>>> recovering from empty snapshot.
>>>>>>> 
>>>>>>> To continue upgrade, just delete all txn log files and let the
node
>>>>> sync
>>>>>>> the snapshot from the quorum.
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfranke@gmail.com>
ha
>>>>>> scritto:
>>>>>>>> 
>>>>>>>>> It also seems that 3.5.5 does not attempt to read all
of the
>>>>> logfiles
>>>>>>> (I
>>>>>>>>> have to still confirm), but the two it reads exist, it
has access
>>>>> and
>>>>>>>> they
>>>>>>>>> are much more than 0 byte
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>> 
>>>>>>>> Anyone on this list has a better idea?
>>>>>>>> 
>>>>>>>> Enrico
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>>>> jornfranke@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> (of course i do not run them at the same time)
>>>>>>>>>> 
>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
>>>>> jornfranke@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> thank you for the quick reply. They read from
the same disk
>>>>> paths
>>>>>>> and
>>>>>>>>>>> have the same access rights (in fact the RHEL
service executes
>>>>>> them
>>>>>>> as
>>>>>>>>> the
>>>>>>>>>>> same specific user).
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli
<
>>>>>>> eolivelli@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jornfranke@gmail.com>
>>>>> ha
>>>>>>>>> scritto:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I tried to migrate a lab environment
from Zookeepr 3.4.14
>>>>> (used
>>>>>>> for
>>>>>>>>>>>> Solr)
>>>>>>>>>>>>> to 3.5.5 and encountered an issue. It
is ZooKeeper in
>>>>>> standalone
>>>>>>>> mode
>>>>>>>>>>>>> (other environments have a proper ensemble).
I increased
>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>> beyond the default (but not excessively)
- this was working
>>>>>>>> perfectly
>>>>>>>>>>>> fine
>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Basically I reuse for the migration the
same config files,
>>>>>> except
>>>>>>>>> that
>>>>>>>>>>>> I
>>>>>>>>>>>>> whitelist some commands (later I am also
interested in
>>>> adding
>>>>>>> SSL).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have the following error message when
starting Zookeeper
>>>>> with
>>>>>>>> 3.5.5
>>>>>>>>>>>>> (basically, I just changed the symboling
link from
>>>> zookeeper
>>>>> to
>>>>>>>> point
>>>>>>>>>>>> to
>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>> - EOF exception java.io.EOFException:
Failed to read
>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>> ]
>>>>>>>>>>>> -
>>>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>>>> java.io.IOException: No snapshot found,
but there are log
>>>>>>> entries.
>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Strangely enough, if I switch back to
3.4.14 the issue is
>>>>>>> resolved
>>>>>>>>> and
>>>>>>>>>>>>> Zookeeper works normally. However, I
would like to leverage
>>>>> the
>>>>>>> new
>>>>>>>>>>>> version
>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are no 0 bytes files. Disk space
is plenty available.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Can you compare these logs with  logs of
3.4.x ? Are they
>>>>> reading
>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> Any idea beyond erasing the data dir
(I would try to avoid
>>>>> it,
>>>>>> I
>>>>>>>> can
>>>>>>>>>>>>> reconstruct it, but still)?  I will try
also in the other
>>>>>>>>> environments
>>>>>>>>>>>> and
>>>>>>>>>>>>> also with an environment with an ensemble,
but i would like
>>>>> to
>>>>>>> know
>>>>>>>>>>>> before
>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>> Activated Kerberos Authentication and
Kerberos SSL for
>>>>> clients
>>>>>>> and
>>>>>>>>>>>> quorum.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Quorum? In standalone mode there is no 'quorum'
auth
>>>>>>>>>>>> 
>>>>>>>>>>>> Enrico
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 

Mime
View raw message