zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiafu Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2802) Zookeeper C client hang @wait_sync_completion
Date Mon, 28 Aug 2017 11:09:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143651#comment-16143651

Jiafu Jiang commented on ZOOKEEPER-2802:

[~yihao]I have the same PB, have you find the solution?

> Zookeeper C client hang @wait_sync_completion
> ---------------------------------------------
>                 Key: ZOOKEEPER-2802
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2802
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>    Affects Versions: 3.4.6
>         Environment: DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS"
>            Reporter: yihao yang
>            Priority: Critical
>         Attachments: zookeeper.out.2017.05.31-10.06.23
> I was using zookeeper 3.4.6 c client to access one zookeeper server in a VM. The VM environment
is not stable and I get a lot of EXPIRED_SESSION_STATE events. I will create another session
to ZK when I get an expired event. I also have a read/write lock to protect session read (get/list/...
on zk) and write(connect, close, reconnect zhandle).
> The problem is the session got an EXPIRED_SESSION_STATE event and when it tried to hold
the write lock and  reconnect the session, it found there is a thread was holding the read
lock (which was operating sync list on zk). See the stack below:
> GDBStack:
> Thread 7 (Thread 0x7f838a43a700 (LWP 62845)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x0000000000636033 in  wait_sync_completion (sc=sc@entry=0x7f8344000af0) at src/mt_adaptor.c:85
> #2 0x0000000000633248 in zoo_wget_children2_ (zh=<optimized out>, path=0x7f83440677a8
"/dict/objects/__services/RLS-GSE/_static_nodes", watcher=0x0, watcherCtx=0x13e6310, strings=0x7f838a4397b0,
stat=0x7f838a4398d0) at src/zookeeper.c:3630
> #3 0x000000000045e6ff in ZooKeeperContext::getChildren (this=0x13e6310, path=..., children=children@entry=0x7f838a439890,
stat=stat@entry=0x7f838a4398d0) at zookeeper_context.cpp:xxx
> This sync list didn't return a ZINVALIDSTAT but hung. Anyone know the problem?

This message was sent by Atlassian JIRA

View raw message