zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lawrence114 (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ZOOKEEPER-1720) Race in zookeeper_close() leads to hang
Date Tue, 17 Oct 2017 12:15:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207555#comment-16207555
] 

lawrence114 edited comment on ZOOKEEPER-1720 at 10/17/17 12:14 PM:
-------------------------------------------------------------------

hi Kevin,
I have used the version zookeeper-3.5.1-alpha, and I have met the same issue recently, 
I want know is there any progress about this problem in your side?


was (Author: lawrence):
hi Kevin,
I have met the same issue recently, I want know is there any progress about this problem in
your side?

> Race in zookeeper_close() leads to hang
> ---------------------------------------
>
>                 Key: ZOOKEEPER-1720
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1720
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>    Affects Versions: 3.5.0
>         Environment: Ubuntu 12.04.1
>            Reporter: Kevin Jamieson
>
> Using ZK 3.5.4, zookeeper_close() occasionally hangs with a backtrace of the form:
> {noformat}
> #0  0x00002b255fab489c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00002b255fab26b0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x00002b2560568ced in unlock_completion_list (l=0x13f5430) at src/mt_adaptor.c:69
> #3  0x00002b256055b9ec in free_completions (zh=0x13f5270, callCompletion=1, reason=-116)
at src/zookeeper.c:1521
> #4  0x00002b256055d3bc in zookeeper_close (zh=0x13f5270) at src/zookeeper.c:2954
> {noformat}
> At which point the zhandle_t struct appears to have already been freed, as it contains
garbage:
> {noformat}
> (gdb) p zh->sent_requests.cond
> $19 = {
>   __data = {
>     __lock = 2, 
>     __futex = 0, 
>     __total_seq = 18446744073709551615, 
>     __wakeup_seq = 0, 
>     __woken_seq = 0, 
>     __mutex = 0x0, 
>     __nwaiters = 0, 
>     __broadcast_seq = 0
>   }, 
>   __size = "\002\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377", '\000'
<repeats 31 times>, 
>   __align = 2
> }
> {noformat}
> There appears to be a race condition in the following code:
> {noformat}
> int api_epilog(zhandle_t *zh,int rc)
> {
>     if(inc_ref_counter(zh,-1)==0 && zh->close_requested!=0)
>         zookeeper_close(zh);
>     return rc;
> }
> int zookeeper_close(zhandle_t *zh)
> {
>     int rc=ZOK;
>     if (zh==0)
>         return ZBADARGUMENTS;
>     zh->close_requested=1;
>     if (inc_ref_counter(zh,1)>1) {
> {noformat}
> As api_epilog() may free zh in between zookeeper_close() setting zh->close_requested=1
and incrementing the reference count.
> The following patch should fix the problem:
> {noformat}
> diff --git a/src/c/src/zookeeper.c b/src/c/src/zookeeper.c
> index 6943243..61a263a 100644
> --- a/src/c/src/zookeeper.c
> +++ b/src/c/src/zookeeper.c
> @@ -1051,6 +1051,7 @@ zhandle_t *zookeeper_init(const char *host, watcher_fn watcher,
>          goto abort;
>      }
>  
> +    api_prolog(zh);
>      return zh;
>  abort:
>      errnosave=errno;
> @@ -2889,7 +2890,7 @@ int zookeeper_close(zhandle_t *zh)
>          return ZBADARGUMENTS;
>  
>      zh->close_requested=1;
> -    if (inc_ref_counter(zh,1)>1) {
> +    if (inc_ref_counter(zh,0)>1) {
>          /* We have incremented the ref counter to prevent the
>           * completions from calling zookeeper_close before we have
>           * completed the adaptor_finish call below. */
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message