From user-return-12052-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Wed Aug 14 05:13:09 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1FF391804BB for ; Wed, 14 Aug 2019 07:13:09 +0200 (CEST) Received: (qmail 64993 invoked by uid 500); 14 Aug 2019 05:13:07 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 64981 invoked by uid 99); 14 Aug 2019 05:13:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Aug 2019 05:13:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0A2C6180D83 for ; Wed, 14 Aug 2019 05:13:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.301 X-Spam-Level: * X-Spam-Status: No, score=1.301 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ed5Swz1BnqZy for ; Wed, 14 Aug 2019 05:13:05 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.128.45; helo=mail-wm1-f45.google.com; envelope-from=jornfranke@gmail.com; receiver= Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 8FDAEBC7A9 for ; Wed, 14 Aug 2019 05:13:04 +0000 (UTC) Received: by mail-wm1-f45.google.com with SMTP id g67so3272339wme.1 for ; Tue, 13 Aug 2019 22:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:content-transfer-encoding:mime-version:date:subject:message-id :references:in-reply-to:to; bh=LDcHgiHtjYIQsvpVsMNCu9lAwgpdpjFLy9Mbw7KVjHo=; b=T/4PrWG4U8cKpwTo5FWh0mYpIV7eVGOlD3mQ9aOIpZao3HQ8e5Cw8ccWhwDthD8tpr qSSsL86UEmMzf8XULRF4WMf18mEZyMDcq+O5zNZrtKJ2kwPIZIpddVG3F//+5zGpcvt0 cbj5IHxf0Ww+wLmaIa5hiwsUXwrMIErDohxyfPKUCzQFjanE6ZygRhGZPzXPRPD50ETm dr0nqoVEOcszhaCVe3u8gHZoalrWthWWe2ieWl7+djw6qNi5C81rod6E1dquwhUvgkpM GL34/RCbzr2A26U8HguXVQEoDv9ShO1Qtd8gpEpYNvDMq9h3Gw//V7qJjNlqB/gvvYYO Hrag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version:date :subject:message-id:references:in-reply-to:to; bh=LDcHgiHtjYIQsvpVsMNCu9lAwgpdpjFLy9Mbw7KVjHo=; b=C1a20azxJCHdOsRSxw1dQYAh4hcXWkJlmBKlLDsrae6wyX8LjFPNkf5jU2pkOSAJgn WAHJMXJdz0gCSspyG2n8ZJuJF+BzogkfIfKjIksp7u5tMe8bYEJp11TQLFvMVyTUN1by G9QI73jN8wl37zEXnYesSNf/C9k61pbzw2btoYHAY5PJIdOxnLKozpUCxD9m5iiTjHn0 oalMX7jIvktLT5UuVeWs8SYRnS06u1fur1mO7u4ETFxE9XlIvPqVqrkRDPYl3pN6FKmp YyqOVLj8SLoOj3deRmkEpKpCiVQQCiHZPxEgCCIe5QM7AqWcMdo+MWa4+T4gunkIn/4p +1PQ== X-Gm-Message-State: APjAAAXGwLD8PqFeqqFVB3XfnmGuxqS8KqOp0vKlT9CoJ2arvV0P0XEw 3APaE3UFF973rkDIox28/aQ4c51p X-Google-Smtp-Source: APXvYqxN+enKI8Fx9WiTzDXwZm4Se/OghaU3UZWsLwOojZ5vS/vQQRW5ZTTG7ruWg4PqSNKQBxI+aA== X-Received: by 2002:a1c:cfc6:: with SMTP id f189mr6068345wmg.18.1565759583153; Tue, 13 Aug 2019 22:13:03 -0700 (PDT) Received: from ?IPv6:2a02:908:1a7:1200:502c:52aa:3f45:e034? ([2a02:908:1a7:1200:502c:52aa:3f45:e034]) by smtp.gmail.com with ESMTPSA id u130sm5666690wmg.28.2019.08.13.22.13.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Aug 2019 22:13:02 -0700 (PDT) From: =?utf-8?Q?J=C3=B6rn_Franke?= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Date: Wed, 14 Aug 2019 07:13:01 +0200 Subject: Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5 Message-Id: <40AA721E-C28A-459B-9B1F-682FF2D3D123@gmail.com> References: In-Reply-To: To: user@zookeeper.apache.org X-Mailer: iPhone Mail (16G77) For me the issue occurred only in standalone mode. With the ensemble I simpl= y cleared the data directory and it received the zookeeper data from the quo= rum.=20 > Am 13.08.2019 um 15:42 schrieb Koen De Groote : >=20 > I would also like to know if this is possible. >=20 > =46rom going over the github page, it seems there is a JMX method to force= > the creation of a snapshot. Yet the docker image is configured as such tha= t > a port will never be assigned to the JMX process. >=20 > Is there any way to bypass this? >=20 >> On Tue, Jul 30, 2019 at 8:51 AM J=C3=B6rn Franke w= rote: >>=20 >> Thanks. It is possible to force Zookeeper to create a snapshot? I will >> check I think the snapshot count is set to 1 in the cfg >>=20 >>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli : >>>=20 >>> Il giorno lun 29 lug 2019 alle ore 23:59 J=C3=B6rn Franke < >> jornfranke@gmail.com> >>> ha scritto: >>>=20 >>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If >> it >>>> is missing then I wonder why it was missing. There was no crash or >> whatever >>>> and 3.4.14 works without issue, but of course it could have loaded them= >>>> from the log files. However, then I wonder why it does not create one. >>>>=20 >>>=20 >>>=20 >>>=20 >>> I remember now that some other user, I think Sijie, reported a similar >>> problem some month ago, that it is not possible to upgrade from 3.4 to >> 3.5 >>> if no snapshot is present. >>> IIRC The fix was to force the creation of at least one snapshot file and= >>> then upgrade >>>=20 >>> Enrico >>>=20 >>>=20 >>>>=20 >>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han wrote: >>>>=20 >>>>>>> I just wonder why it does not find a valid snapshot. >>>>>=20 >>>>> If there are local snapshot files and the files are valid, then it's a= >>>> bug >>>>> that server fails to load them. >>>>>=20 >>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14? >>>>>=20 >>>>> Not I am aware of. There are some format changes (added compression >>>>> support) in master branch, but that's not shipped with 3.5.5. >>>>>=20 >>>>>=20 >>>>>=20 >>>>> On Mon, Jul 29, 2019 at 2:31 PM J=C3=B6rn Franke >>>> wrote: >>>>>=20 >>>>>> ok, then it affects basically all standalone nodes? This is fine, >>>> despite >>>>>> that it means some extra work (for uncritical lab environments). >>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history= >>>>>> behind it).The logs are fine (it works in 3.4.14 without issues, even= >>>>> after >>>>>> downgrading back). There is no issue with disk space and there are no= >> 0 >>>>>> byte files. I just wonder why it does not find a valid snapshot. Is >> it >>>>>> because the format changed in 3.5.5 compared to 3.4.14? >>>>>>=20 >>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han wrote:= >>>>>>=20 >>>>>>>>> java.io.IOException: No snapshot found, but there are log entries.= >>>>>>> Something is broken! >>>>>>>=20 >>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't wan= t >>>>> to >>>>>>> end up with potential inconsistent state across the ensemble when >>>>>>> recovering from empty snapshot. >>>>>>>=20 >>>>>>> To continue upgrade, just delete all txn log files and let the node >>>>> sync >>>>>>> the snapshot from the quorum. >>>>>>>=20 >>>>>>>=20 >>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli >>>>=20 >>>>>>> wrote: >>>>>>>=20 >>>>>>>> Il lun 29 lug 2019, 22:32 J=C3=B6rn Franke h= a >>>>>> scritto: >>>>>>>>=20 >>>>>>>>> It also seems that 3.5.5 does not attempt to read all of the >>>>> logfiles >>>>>>> (I >>>>>>>>> have to still confirm), but the two it reads exist, it has access >>>>> and >>>>>>>> they >>>>>>>>> are much more than 0 byte >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>> We should have the stackstace of the EOFException. >>>>>>>>=20 >>>>>>>> Anyone on this list has a better idea? >>>>>>>>=20 >>>>>>>> Enrico >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM J=C3=B6rn Franke < >>>> jornfranke@gmail.com >>>>>>=20 >>>>>>>> wrote: >>>>>>>>>=20 >>>>>>>>>> (of course i do not run them at the same time) >>>>>>>>>>=20 >>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM J=C3=B6rn Franke < >>>>> jornfranke@gmail.com >>>>>>>=20 >>>>>>>>> wrote: >>>>>>>>>>=20 >>>>>>>>>>> thank you for the quick reply. They read from the same disk >>>>> paths >>>>>>> and >>>>>>>>>>> have the same access rights (in fact the RHEL service executes >>>>>> them >>>>>>> as >>>>>>>>> the >>>>>>>>>>> same specific user). >>>>>>>>>>>=20 >>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli < >>>>>>> eolivelli@gmail.com >>>>>>>>>=20 >>>>>>>>>>> wrote: >>>>>>>>>>>=20 >>>>>>>>>>>> Il lun 29 lug 2019, 21:50 J=C3=B6rn Franke >>>>> ha >>>>>>>>> scritto: >>>>>>>>>>>>=20 >>>>>>>>>>>>> Hi, >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14 >>>>> (used >>>>>>> for >>>>>>>>>>>> Solr) >>>>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in >>>>>> standalone >>>>>>>> mode >>>>>>>>>>>>> (other environments have a proper ensemble). I increased >>>>>>>>> jute.maxbuffer >>>>>>>>>>>>> beyond the default (but not excessively) - this was working >>>>>>>> perfectly >>>>>>>>>>>> fine >>>>>>>>>>>>> in 3.4.14. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Basically I reuse for the migration the same config files, >>>>>> except >>>>>>>>> that >>>>>>>>>>>> I >>>>>>>>>>>>> whitelist some commands (later I am also interested in >>>> adding >>>>>>> SSL). >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I have the following error message when starting Zookeeper >>>>> with >>>>>>>> 3.5.5 >>>>>>>>>>>>> (basically, I just changed the symboling link from >>>> zookeeper >>>>> to >>>>>>>> point >>>>>>>>>>>> to >>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory: >>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] >>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34 >>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] >>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34 >>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] >>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read >>>>>>>>>>>>> /zookeeper/version-2/log.b34 >>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] >>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72 >>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] >>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72 >>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG >>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] >>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read >>>>>>>>>>>>> /zookeeper/version-2/log.b72 >>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR >>>>>>>> [main:ZooKeeperServerMain@83 >>>>>>>>> ] >>>>>>>>>>>> - >>>>>>>>>>>>> Unexpected exception, exiting abnormally >>>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log >>>>>>> entries. >>>>>>>>>>>>> Something is broken! >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSna= pLog.java:211) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>=20 >>>>>=20 >> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java= :290) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.jav= a:450) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFac= tory.java:764) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.j= ava:98) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperSe= rverMain.java:144) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeepe= rServerMain.java:106) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.= java:64) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(Quorum= PeerMain.java:128) >>>>>>>>>>>>> at >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.jav= a:82) >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is >>>>>>> resolved >>>>>>>>> and >>>>>>>>>>>>> Zookeeper works normally. However, I would like to leverage >>>>> the >>>>>>> new >>>>>>>>>>>> version >>>>>>>>>>>>> 3.5.5. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available. >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> Can you compare these logs with logs of 3.4.x ? Are they >>>>> reading >>>>>>>> from >>>>>>>>>>>> the >>>>>>>>>>>> same disk paths? >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid >>>>> it, >>>>>> I >>>>>>>> can >>>>>>>>>>>>> reconstruct it, but still)? I will try also in the other >>>>>>>>> environments >>>>>>>>>>>> and >>>>>>>>>>>>> also with an environment with an ensemble, but i would like >>>>> to >>>>>>> know >>>>>>>>>>>> before >>>>>>>>>>>>> what the issue could be. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Not sure if it is relevant, but: >>>>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for >>>>> clients >>>>>>> and >>>>>>>>>>>> quorum. >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth >>>>>>>>>>>>=20 >>>>>>>>>>>> Enrico >>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>=20