From issues-return-179-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Sun Jun 23 18:32:01 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8DD7C180679 for ; Sun, 23 Jun 2019 20:32:01 +0200 (CEST) Received: (qmail 4073 invoked by uid 500); 23 Jun 2019 18:32:01 -0000 Mailing-List: contact issues-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list issues@zookeeper.apache.org Received: (qmail 4044 invoked by uid 99); 23 Jun 2019 18:32:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Jun 2019 18:32:00 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 53F15E0103 for ; Sun, 23 Jun 2019 18:32:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1A51E25447 for ; Sun, 23 Jun 2019 18:32:00 +0000 (UTC) Date: Sun, 23 Jun 2019 18:32:00 +0000 (UTC) From: "Alexander A. Strelets (JIRA)" To: issues@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ZOOKEEPER-2894) Memory and completions leak on zookeeper_close. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander A. Strelets updated ZOOKEEPER-2894: --------------------------------------------- Description: ZooKeeper C Client *+single thread+* build First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in two ways: a) from a ZooKeeper callback handler (completion or watcher) which in turn is called through _zookeeper_process()_ b) and from other places -- i.e., when the call-stack does not pass through any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_ The issue described here below is +specific only to the case (b)+. So, it's Ok with the case (a). When _zookeeper_close()_ is called in the (b) way, the following happens: 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, they all are removed from this queue and each of them is "completed" with personal fake response having status ZCLOSING. Such fake responses are put into _zh.completions_to_process_ queue. It's Ok 2. But then, _zh.completions_to_process_ queue is left unhandled. +Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed+ 3. Different structures within _zh_ are dismissed and finally _zh_ is freed This is illustrated on the screenshot attached to this ticket: you may see that the next instruction to execute will be _free(zh)_ while _zh.completions_to_process_ queue is not empty (see the "Variables" tab to the right). Alternatively, the same situation but in the case (a) is handled properly -- i.e., all completion callback handlers are truly called with ZCLOSING and the memory is freed, both for subcases (a.1) when there is a failure like connection-timeout, connection-closed, etc., or (a.2) there is not failure. The reason is that any callback handler (completion or watcher) in the case (a) is called from the _process_completions()_ function which runs in the loop until _zh.completions_to_process_ queue gets empty. So, this function guarantees this queue to be completely processed even if new completions occur during reaction on previously queued completions. *Consequently:* 1. At least there is definitely the +memory leak+ in the case (b) -- all the fake responses put into _zh.completions_to_process_ queue are lost after _free(zh)_ 2. And it looks like a great misbehavior not to call completions on sent requests in the case (b) while they are called with ZCLOSING in the case (a) -- so, I think it's not "by design" but a +completions leak+ +To reproduce the case (b) do the following:+ - open ZooKeeper session, connect to a server, receive and process connected-watch, etc. - then somewhere +from the main events loop+ call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK - then, +immediately after it returned+, call _zookeeper_close()_ - note that completion callback handler for _zoo_acreate()_ *will not be called* +To reproduce the case (a) do the following:+ - the same as above, open ZooKeeper session, connect to a server, receive and process connected-watch, etc. - the same as above, somewhere from the main events loop call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK - but now don't call _zookeeper_close()_ immediately -- wait for completion callback on the commenced request - when _zoo_acreate()_ completes, +from within its completion callback handler+, call another _zoo_acreate()_ and immediately after it returned call _zookeeper_close()_ - note that completion callback handler for the second _zoo_acreate()_ *will be called with ZCLOSING, unlike the case (b) described above* *To fix this I propose:* Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done in _handle_error(zhandle_t *zh,int rc)_. This is a proposed fix: https://github.com/apache/zookeeper/pull/363 [upd] There are another tickets with about the same problem: ZOOKEEPER-1493, ZOOKEEPER-2073 (the "same" just according to their titles). However, as I can see, the corresponding patches were applied on the branch 3.4.10, but the effect still persists -- so, this ticket does not duplicate the mentioned two. was: ZooKeeper C Client *+single thread+* build First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in two ways: a) from a ZooKeeper callback handler (completion or watcher) which in turn is called through _zookeeper_process()_ b) and from other places -- i.e., when the call-stack does not pass through any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_ The issue described here below is +specific only to the case (b)+. So, it's Ok with the case (a). When _zookeeper_close()_ is called in the (b) way, the following happens: 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, they all are removed from this queue and each of them is "completed" with personal fake response having status ZCLOSING. Such fake responses are put into _zh.completions_to_process_ queue. It's Ok 2. But then, _zh.completions_to_process_ queue is left unhandled. +Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed+ 3. Different structures within _zh_ are dismissed and finally _zh_ is freed This is illustrated on the screenshot attached to this ticket: you may see that the next instruction to execute will be _free(zh)_ while _zh.completions_to_process_ queue is not empty (see the "Variables" tab to the right). Alternatively, the same situation but in the case (a) is handled properly -- i.e., all completion callback handlers are truly called with ZCLOSING and the memory is freed, both for subcases (a.1) when there is a failure like connection-timeout, connection-closed, etc., or (a.2) there is not failure. The reason is that any callback handler (completion or watcher) in the case (a) is called from the _process_completions()_ function which runs in the loop until _zh.completions_to_process_ queue gets empty. So, this function guarantees this queue to be completely processed even if new completions occur during reaction on previously queued completions. *Consequently:* 1. At least there is definitely the +memory leak+ in the case (b) -- all the fake responses put into _zh.completions_to_process_ queue are lost after _free(zh)_ 2. And it looks like a great misbehavior not to call completions on sent requests in the case (b) while they are called with ZCLOSING in the case (a) -- so, I think it's not "by design" but a +completions leak+ +To reproduce the case (b) do the following:+ - open ZooKeeper session, connect to a server, receive and process connected-watch, etc. - then somewhere +from the main events loop+ call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK - then, +immediately after it returned+, call _zookeeper_close()_ - note that completion callback handler for _zoo_acreate()_ *will not be called* +To reproduce the case (a) do the following:+ - the same as above, open ZooKeeper session, connect to a server, receive and process connected-watch, etc. - the same as above, somewhere from the main events loop call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK - but now don't call _zookeeper_close()_ immediately -- wait for completion callback on the commenced request - when _zoo_acreate()_ completes, +from within its completion callback handler+, call another _zoo_acreate()_ and immediately after it returned call _zookeeper_close()_ - note that completion callback handler for the second _zoo_acreate()_ *will be called with ZCLOSING, unlike the case (b) described above* To fix this I propose calling to _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done in _handle_error(zhandle_t *zh,int rc)_. This is a proposed fix: https://github.com/apache/zookeeper/pull/363 [upd] There are another tickets with about the same problem: ZOOKEEPER-1493, ZOOKEEPER-2073 (the "same" just according to their titles). However, as I can see, the corresponding patches were applied on the branch 3.4.10, but the effect still persists -- so, this ticket does not duplicate the mentioned two. > Memory and completions leak on zookeeper_close. > ----------------------------------------------- > > Key: ZOOKEEPER-2894 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894 > Project: ZooKeeper > Issue Type: Bug > Components: c client > Affects Versions: 3.4.10 > Environment: Linux ubuntu 4.4.0-87-generic > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 > https://github.com/apache/zookeeper.git > branch-3.4 > Reporter: Alexander A. Strelets > Assignee: Alexander A. Strelets > Priority: Critical > Labels: easyfix, pull-request-available > Fix For: 3.4.10 > > Attachments: zk-will-free-zh-and-lose-completions.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > ZooKeeper C Client *+single thread+* build > First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in two ways: > a) from a ZooKeeper callback handler (completion or watcher) which in turn is called through _zookeeper_process()_ > b) and from other places -- i.e., when the call-stack does not pass through any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_ > The issue described here below is +specific only to the case (b)+. So, it's Ok with the case (a). > When _zookeeper_close()_ is called in the (b) way, the following happens: > 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, they all are removed from this queue and each of them is "completed" with personal fake response having status ZCLOSING. Such fake responses are put into _zh.completions_to_process_ queue. It's Ok > 2. But then, _zh.completions_to_process_ queue is left unhandled. +Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed+ > 3. Different structures within _zh_ are dismissed and finally _zh_ is freed > This is illustrated on the screenshot attached to this ticket: you may see that the next instruction to execute will be _free(zh)_ while _zh.completions_to_process_ queue is not empty (see the "Variables" tab to the right). > Alternatively, the same situation but in the case (a) is handled properly -- i.e., all completion callback handlers are truly called with ZCLOSING and the memory is freed, both for subcases (a.1) when there is a failure like connection-timeout, connection-closed, etc., or (a.2) there is not failure. The reason is that any callback handler (completion or watcher) in the case (a) is called from the _process_completions()_ function which runs in the loop until _zh.completions_to_process_ queue gets empty. So, this function guarantees this queue to be completely processed even if new completions occur during reaction on previously queued completions. > *Consequently:* > 1. At least there is definitely the +memory leak+ in the case (b) -- all the fake responses put into _zh.completions_to_process_ queue are lost after _free(zh)_ > 2. And it looks like a great misbehavior not to call completions on sent requests in the case (b) while they are called with ZCLOSING in the case (a) -- so, I think it's not "by design" but a +completions leak+ > +To reproduce the case (b) do the following:+ > - open ZooKeeper session, connect to a server, receive and process connected-watch, etc. > - then somewhere +from the main events loop+ call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK > - then, +immediately after it returned+, call _zookeeper_close()_ > - note that completion callback handler for _zoo_acreate()_ *will not be called* > +To reproduce the case (a) do the following:+ > - the same as above, open ZooKeeper session, connect to a server, receive and process connected-watch, etc. > - the same as above, somewhere from the main events loop call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK > - but now don't call _zookeeper_close()_ immediately -- wait for completion callback on the commenced request > - when _zoo_acreate()_ completes, +from within its completion callback handler+, call another _zoo_acreate()_ and immediately after it returned call _zookeeper_close()_ > - note that completion callback handler for the second _zoo_acreate()_ *will be called with ZCLOSING, unlike the case (b) described above* > *To fix this I propose:* > Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done in _handle_error(zhandle_t *zh,int rc)_. > This is a proposed fix: https://github.com/apache/zookeeper/pull/363 > [upd] > There are another tickets with about the same problem: ZOOKEEPER-1493, ZOOKEEPER-2073 (the "same" just according to their titles). > However, as I can see, the corresponding patches were applied on the branch 3.4.10, but the effect still persists -- so, this ticket does not duplicate the mentioned two. -- This message was sent by Atlassian JIRA (v7.6.3#76005)