From user-return-12040-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Mon Aug 5 20:16:07 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 07C3C180181 for ; Mon, 5 Aug 2019 22:16:06 +0200 (CEST) Received: (qmail 79909 invoked by uid 500); 5 Aug 2019 20:16:04 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 79897 invoked by uid 99); 5 Aug 2019 20:16:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Aug 2019 20:16:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D13881A3301 for ; Mon, 5 Aug 2019 20:16:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.803 X-Spam-Level: * X-Spam-Status: No, score=1.803 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 9XSJ9HVzn_pF for ; Mon, 5 Aug 2019 20:16:01 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.210.196; helo=mail-pf1-f196.google.com; envelope-from=john.lindwall@gmail.com; receiver= Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 8E3D6BC7B3 for ; Mon, 5 Aug 2019 20:16:01 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id u14so40210926pfn.2 for ; Mon, 05 Aug 2019 13:16:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language; bh=hD0gdNxPE6LiHAjAnLoM0Gwm1GAo9rLfc4C7vPJphJM=; b=mpSPS05pfwckyUmBjg4G6dlE/X8GGaKUOZwFBQLld//mRyWgAOWiX6JVpJs5GRzxQZ y04HikcHFq+f7ZZWzi4j7bA1O06aNchDr4d6hzSk9pGh2vAsONn3upMrD3FUN4JRU6Xq HxNm/K0/ODPhAz8GirNuPM3bD2rSPUNu7uWmcio3V9NQYDJA4a6EeK7QXZJmBkGFo/M+ ajR4+IxdxN6LDoqjIDKzj+z1+lLU5MM2mil7Ya87+Elga0mHtH9uBrWgfBOwWCOzFcL/ 4aM1GwfmCLXCBVBqYTVMnGT8uoN6PJcLSb1f05qw71BpXWt8L4sp0UZFbh6Ygw3lsUJv qH3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=hD0gdNxPE6LiHAjAnLoM0Gwm1GAo9rLfc4C7vPJphJM=; b=iCt6L77/PsgsKoHA22JBwsUM+LQpiKfsOu+SD1l2RFsXlTW9EDK9hVymACyJQ8HUcl pZHiD6/7zRdk27V5i3YXknVsIHNYH7PrLbLDKGurz/VWgVWWo0703KEu5Y3VhUVz9A15 KSO/nu6d/bERXkttMzri/9IWYyYaYzNywVS651QVv+6DqmvmE9zSYDR8BGPSl1Ier4uX bgseoa431Ou71Rwgk7a/AZU+//7QPPBq0JmODeGknNenpUXunSNXKy89t4zJR+vn8nAj s0vYn1rPHHFXijbtjyY3ZS3PvaSTBRseExDAVXigclzvycNF/Elr5NciUoeCkNrDWf+S BVnA== X-Gm-Message-State: APjAAAW7dxMhBwyS2VAtdACMYQt8FPCn9n/v+FppPUjIvRRGDj6w0vNS AMIHNKKkIWHI4NAFK8ZAJ04= X-Google-Smtp-Source: APXvYqydD9pR+XBI86PGQJzjtKFayPFs+wzb2+JfPtFwtZQPYSg75OWputcTdOhSlh+Q9Qg30A9srw== X-Received: by 2002:a17:90a:c20e:: with SMTP id e14mr17723735pjt.0.1565036160582; Mon, 05 Aug 2019 13:16:00 -0700 (PDT) Received: from DDET-2.local ([2600:1012:b146:9400:2884:abad:77c1:7f98]) by smtp.googlemail.com with ESMTPSA id ev3sm18404546pjb.3.2019.08.05.13.15.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Aug 2019 13:15:59 -0700 (PDT) Subject: Re: Ephemeral znodes not getting removed To: user@zookeeper.apache.org, Patrick Hunt References: From: John Lindwall Message-ID: <2552b149-d679-ad4b-cef8-0932e6ad4dbe@gmail.com> Date: Mon, 5 Aug 2019 13:15:56 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 PostboxApp/6.1.18 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------B1BBE533FFA448C24925879A" Content-Language: en-US --------------B1BBE533FFA448C24925879A Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Thanks for the response! My direct access to this zk cluster is limited.� I'll see about getting a copy of the logs to examine.� I'll also try to coordinate your experiment of creating a znode in each node in turn and checking the cluster-wide view of that data.� If we see a situation where the "global view" is inconsistent what would be the next step? I did receive output from each cluster node containing the results of these 4-letter words: dump, cons, mntr, and stat.� For one of the ephemerals in question we could see a record of it in the "dump" output for one of the 3 cluster nodes (the leader) but not in the other 2 nodes dump output.� Weirdly, the session id associated with that ephemeral znode does not appear in the "cons" output for any of the cluster members.� So this appears to be an ephemeral that has survived the termination of its associated zk session (!?) Thanks for any advice or feedback, John Patrick Hunt wrote on 8/2/19 9:38 AM: > The jira you ref'd is the only one that comes to mind. In terms of > troubleshooting - try connecting a client to each of the servers in tern > and see if it's a situation where they have a different view of the world > wrt those znodes. You might also have the client create separate znodes on > each server and ensure that they are consistent. The logs are also > typically a good source of information - check against the session id. > > Patrick > > On Wed, Jul 31, 2019 at 5:54 PM John Lindwall > wrote: > >> ZooKeeper 3.4.6-1569965 >> >> In our environment we seem to have a situation where ephemeral znodes >> are not getting removed after the zookeeper session has been >> terminated. We can see examples of znodes that were created 3-4 days >> past that still exist, though the zk sessions bound to those znodes >> should no longer exist. >> >> Note that we've had this cluster running to about 4 years and have not >> seen this problem until recently. >> >> 1. I am wondering if there are any known issues that would affect our >> zookeeper version that may cause this behavior? >> 2. Is it possible our servers are simply in a "bad state" and a simple >> reboot might clean things up? >> 3. Any tips on diagnosing this? >> >> We noticed this issue from 2011 but that seems to have been fixed in our >> branch. >> >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-1208 >> >> Thanks, >> John Lindwall >> -- Sent from Postbox --------------B1BBE533FFA448C24925879A--