From user-return-12051-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Tue Aug 13 20:59:32 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 147771804BB for ; Tue, 13 Aug 2019 22:59:31 +0200 (CEST) Received: (qmail 84185 invoked by uid 500); 13 Aug 2019 20:59:31 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 84092 invoked by uid 99); 13 Aug 2019 20:59:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Aug 2019 20:59:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2E5E4C0B51 for ; Tue, 13 Aug 2019 20:59:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.301 X-Spam-Level: *** X-Spam-Status: No, score=3.301 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id gCQt8MKr4OVC for ; Tue, 13 Aug 2019 20:59:27 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::329; helo=mail-ot1-x329.google.com; envelope-from=eolivelli@gmail.com; receiver= Received: from mail-ot1-x329.google.com (mail-ot1-x329.google.com [IPv6:2607:f8b0:4864:20::329]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id AA22A7D3FB for ; Tue, 13 Aug 2019 20:59:26 +0000 (UTC) Received: by mail-ot1-x329.google.com with SMTP id z17so58874881otk.13 for ; Tue, 13 Aug 2019 13:59:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=rCueSNACvJoIwiTsBHupO4cStauo0cS45ozXRWi9ahA=; b=XYH3NvLQfwvfw0f1hSwmCTWX5+xSeYEuCsRns3n0Rfn4Q5FyOxr1bcIZ+uk7GBoM/8 FwMVSNhIYpazePmX8YQNfUzS1EOXKDaCehQ7NZoMtIWgpJjMqbHw44U0sMBwSJpwqE5f SSBuO7Z1wX8pH67IYWWkpmmg6xn32zHSsfCURTuaM8j1CjbcWgINtrMoeO0OLknwAt80 N79SnrRqmoKjb08+gQEIBoyE53qfqER3ka85mAxQKqi4Z/afccyKYDxw32pTO0RY9qLT ZDHxZ0frH08b5SP+PZRpclKJUNyk3uUbETFoqcY8/5sUItKXIWvjlqRnPW54TCXAOiIJ P3nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=rCueSNACvJoIwiTsBHupO4cStauo0cS45ozXRWi9ahA=; b=Q1GTGU3ISXl2xnASGM8l8y+H3cMjDiwEJKz07BcueD4lWAxK3rsQTJxaXhebyugJYM o+lfJcv/MlbLJ/oGn1JpOUk6TTpGaKP10kbl1GDYZMWhQjhC+ZYHXy/hDyiU7uWKeVwm 1ynlNHmaZLoaxH2quPh6n/ls11pQsdDXZwbu3v3mXRoxutXz3bTM5VVtAyUbNQycOUCV QYrnZhOsCJYSw6Elxl48zBpQDNi/5lDhZOftacsLdGbfSpYPKK+bTzmj4vsnKq3dokbp y42YaOjVMfyI6BVAT9lxg2On2Dncx7SUAz+LNnCFxbzzRH75HOsmE+JIuEN/H+NwOose fqHA== X-Gm-Message-State: APjAAAUB3yfSy390R/Xchmz3xsJ4F922ofFwQxzPdrRIo1kjOUfaWk6P igjyCwZ6mayDZ65Zsn70sSfBqmzU+LnyXL0kKS/TBA== X-Google-Smtp-Source: APXvYqxv2gJdX1kXm4Ow7h2tlezalpw7VPVucKpKK/NSZ5KkhdRlrn2jtSADblPnoltCOP42wQC3ElUhyZVVReQLfFw= X-Received: by 2002:a05:6830:4ac:: with SMTP id l12mr33576073otd.333.1565729965065; Tue, 13 Aug 2019 13:59:25 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Enrico Olivelli Date: Tue, 13 Aug 2019 22:59:13 +0200 Message-ID: Subject: Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5 To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary="00000000000061c961059005ea70" --00000000000061c961059005ea70 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Il mar 13 ago 2019, 15:43 Koen De Groote ha scritto: > I would also like to know if this is possible. > > From going over the github page, it seems there is a JMX method to force > the creation of a snapshot. Yet the docker image is configured as such th= at > a port will never be assigned to the JMX process. > Can't you modify your docker image in order to expose the JMX API? I am not a docket expert but it should be possible Enrico > Is there any way to bypass this? > > On Tue, Jul 30, 2019 at 8:51 AM J=C3=B6rn Franke w= rote: > > > Thanks. It is possible to force Zookeeper to create a snapshot? I will > > check I think the snapshot count is set to 1 in the cfg > > > > > Am 30.07.2019 um 08:06 schrieb Enrico Olivelli : > > > > > > Il giorno lun 29 lug 2019 alle ore 23:59 J=C3=B6rn Franke < > > jornfranke@gmail.com> > > > ha scritto: > > > > > >> ok, then let me verify tomorrow if a snapshot file is indeed there. = If > > it > > >> is missing then I wonder why it was missing. There was no crash or > > whatever > > >> and 3.4.14 works without issue, but of course it could have loaded > them > > >> from the log files. However, then I wonder why it does not create on= e. > > >> > > > > > > > > > > > > I remember now that some other user, I think Sijie, reported a simila= r > > > problem some month ago, that it is not possible to upgrade from 3.4 t= o > > 3.5 > > > if no snapshot is present. > > > IIRC The fix was to force the creation of at least one snapshot file > and > > > then upgrade > > > > > > Enrico > > > > > > > > >> > > >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han wrote= : > > >> > > >>>>> I just wonder why it does not find a valid snapshot. > > >>> > > >>> If there are local snapshot files and the files are valid, then it'= s > a > > >> bug > > >>> that server fails to load them. > > >>> > > >>>>> Is it because the format changed in 3.5.5 compared to 3.4.14? > > >>> > > >>> Not I am aware of. There are some format changes (added compression > > >>> support) in master branch, but that's not shipped with 3.5.5. > > >>> > > >>> > > >>> > > >>> On Mon, Jul 29, 2019 at 2:31 PM J=C3=B6rn Franke > > >> wrote: > > >>> > > >>>> ok, then it affects basically all standalone nodes? This is fine, > > >> despite > > >>>> that it means some extra work (for uncritical lab environments). > > >>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full > history > > >>>> behind it).The logs are fine (it works in 3.4.14 without issues, > even > > >>> after > > >>>> downgrading back). There is no issue with disk space and there are > no > > 0 > > >>>> byte files. I just wonder why it does not find a valid snapshot. = Is > > it > > >>>> because the format changed in 3.5.5 compared to 3.4.14? > > >>>> > > >>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han > wrote: > > >>>> > > >>>>>>> java.io.IOException: No snapshot found, but there are log > entries. > > >>>>> Something is broken! > > >>>>> > > >>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't > want > > >>> to > > >>>>> end up with potential inconsistent state across the ensemble when > > >>>>> recovering from empty snapshot. > > >>>>> > > >>>>> To continue upgrade, just delete all txn log files and let the no= de > > >>> sync > > >>>>> the snapshot from the quorum. > > >>>>> > > >>>>> > > >>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli < > eolivelli@gmail.com > > >>> > > >>>>> wrote: > > >>>>> > > >>>>>> Il lun 29 lug 2019, 22:32 J=C3=B6rn Franke ha > > >>>> scritto: > > >>>>>> > > >>>>>>> It also seems that 3.5.5 does not attempt to read all of the > > >>> logfiles > > >>>>> (I > > >>>>>>> have to still confirm), but the two it reads exist, it has acce= ss > > >>> and > > >>>>>> they > > >>>>>>> are much more than 0 byte > > >>>>>>> > > >>>>>> > > >>>>>> We should have the stackstace of the EOFException. > > >>>>>> > > >>>>>> Anyone on this list has a better idea? > > >>>>>> > > >>>>>> Enrico > > >>>>>> > > >>>>>> > > >>>>>>> On Mon, Jul 29, 2019 at 10:13 PM J=C3=B6rn Franke < > > >> jornfranke@gmail.com > > >>>> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> (of course i do not run them at the same time) > > >>>>>>>> > > >>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM J=C3=B6rn Franke < > > >>> jornfranke@gmail.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> thank you for the quick reply. They read from the same disk > > >>> paths > > >>>>> and > > >>>>>>>>> have the same access rights (in fact the RHEL service execute= s > > >>>> them > > >>>>> as > > >>>>>>> the > > >>>>>>>>> same specific user). > > >>>>>>>>> > > >>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli < > > >>>>> eolivelli@gmail.com > > >>>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Il lun 29 lug 2019, 21:50 J=C3=B6rn Franke > > >>> ha > > >>>>>>> scritto: > > >>>>>>>>>> > > >>>>>>>>>>> Hi, > > >>>>>>>>>>> > > >>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14 > > >>> (used > > >>>>> for > > >>>>>>>>>> Solr) > > >>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in > > >>>> standalone > > >>>>>> mode > > >>>>>>>>>>> (other environments have a proper ensemble). I increased > > >>>>>>> jute.maxbuffer > > >>>>>>>>>>> beyond the default (but not excessively) - this was working > > >>>>>> perfectly > > >>>>>>>>>> fine > > >>>>>>>>>>> in 3.4.14. > > >>>>>>>>>>> > > >>>>>>>>>>> Basically I reuse for the migration the same config files, > > >>>> except > > >>>>>>> that > > >>>>>>>>>> I > > >>>>>>>>>>> whitelist some commands (later I am also interested in > > >> adding > > >>>>> SSL). > > >>>>>>>>>>> > > >>>>>>>>>>> I have the following error message when starting Zookeeper > > >>> with > > >>>>>> 3.5.5 > > >>>>>>>>>>> (basically, I just changed the symboling link from > > >> zookeeper > > >>> to > > >>>>>> point > > >>>>>>>>>> to > > >>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory: > > >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG > > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] > > >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34 > > >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG > > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] > > >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34 > > >>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG > > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] > > >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read > > >>>>>>>>>>> /zookeeper/version-2/log.b34 > > >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG > > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] > > >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72 > > >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG > > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] > > >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72 > > >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG > > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] > > >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read > > >>>>>>>>>>> /zookeeper/version-2/log.b72 > > >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR > > >>>>>> [main:ZooKeeperServerMain@83 > > >>>>>>> ] > > >>>>>>>>>> - > > >>>>>>>>>>> Unexpected exception, exiting abnormally > > >>>>>>>>>>> java.io.IOException: No snapshot found, but there are log > > >>>>> entries. > > >>>>>>>>>>> Something is broken! > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSna= pLog.java:211) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>> > > >>> > > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240= ) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java= :290) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.jav= a:450) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFac= tory.java:764) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.j= ava:98) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperSe= rverMain.java:144) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeepe= rServerMain.java:106) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.= java:64) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(Quorum= PeerMain.java:128) > > >>>>>>>>>>> at > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.jav= a:82) > > >>>>>>>>>>> > > >>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is > > >>>>> resolved > > >>>>>>> and > > >>>>>>>>>>> Zookeeper works normally. However, I would like to leverage > > >>> the > > >>>>> new > > >>>>>>>>>> version > > >>>>>>>>>>> 3.5.5. > > >>>>>>>>>>> > > >>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available. > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Can you compare these logs with logs of 3.4.x ? Are they > > >>> reading > > >>>>>> from > > >>>>>>>>>> the > > >>>>>>>>>> same disk paths? > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid > > >>> it, > > >>>> I > > >>>>>> can > > >>>>>>>>>>> reconstruct it, but still)? I will try also in the other > > >>>>>>> environments > > >>>>>>>>>> and > > >>>>>>>>>>> also with an environment with an ensemble, but i would like > > >>> to > > >>>>> know > > >>>>>>>>>> before > > >>>>>>>>>>> what the issue could be. > > >>>>>>>>>>> > > >>>>>>>>>>> Not sure if it is relevant, but: > > >>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for > > >>> clients > > >>>>> and > > >>>>>>>>>> quorum. > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth > > >>>>>>>>>> > > >>>>>>>>>> Enrico > > >>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > --00000000000061c961059005ea70--