zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Hunt (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (ZOOKEEPER-2443) Invalid Memory Access (SEGFAULT) and undefined behaviour in c client
Date Wed, 15 Jun 2016 04:14:09 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Patrick Hunt reassigned ZOOKEEPER-2443:

    Assignee: Chris Nauroth

[~cnauroth], [~fpj], [~rgs] can you take a look at this? According to Paul the issue was introduced
in ZOOKEEPER-1029.

> Invalid Memory Access (SEGFAULT) and undefined behaviour in c client
> --------------------------------------------------------------------
>                 Key: ZOOKEEPER-2443
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2443
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Paul Asmuth
>            Assignee: Chris Nauroth
> Hey,
> I encountered some issues with the zookeepeer c client.
> The problem starts in the zookeeper_init_internal method. A lot of initialization work
is performed here and if any of the initialization routines fails, the code jumps to the "abort"
label to perform various cleanup tasks [1]. The conceptual issue is that a bunch of the cleanup
code tries to take locks on the zk structure that are only intialized in adaptor_init in line
1181 (at the very end of the zookeeper_init_internal method) [2]. So if we fail before reaching
adaptor_init this causes trouble.
> One specific instance of an invalid memory access that this causes is in free_completions
[3]. Here, in line 1651 zoo_lock_auth will fail because it tries to grab an invalid mutex,
after which the a_list struct is uninitialized (the linked list next pointer points to random
memory) and subsequently the free routine segfaults.
> An easy way to trigger this bug-path is to pass an invalid hostname, or do anything else
that causes the zookeeper_init_internal method to fail before adaptor_init.
> In my local checkout/codebase, I have added correct initialization for the a_list struct
in the free_completions routine, which at least fixes the segfault for now. However this still
leaves the issue that the cleanup code tries to grab a lot of invalid locks, which all fail.
I think in order to fix this properly, one would need to do a larger refactoring of the code
(add another adaptor_preinit routine to the adaptor interface maybe?) and I wasn't sure if
that would be appreciated, so I didn't attach a patch for now. If someone wants me to try
and clean this up, I would be happy to give it a try.
> PS: I think this bug was introduced in SVN #1719528, which - as it seems - tried to work
around the uninitialized locks problem by adding an int return code to all the lock_xxx functions,
allowing them to indicate a failure. The change introduce the invalid memory access since
some (always required) init code is only run after the lock was obtained successfully.
> However, I think there is a much large issue with the change and I think it must be reverted.
Trying to lock an uninitialized mutex is undefined behaviour on POSIX and may lead to deadlocks,
> >> If mutex does not refer to an initialized mutex object, the behavior of pthread_mutex_lock(),
pthread_mutex_trylock(), and pthread_mutex_unlock() is undefined.
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_lock.html
> [1] https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1078
> [2] https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1181
> [3] https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1651
> ------------------
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000010004f6d5 in free_auth_completion (a_list=0x7fff5fbff048) at /deps/3rdparty/zookeeper/source/src/zookeeper.c:260
> 260             tmp = tmp->next;
> #0  0x000000010004f6d5 in free_auth_completion (a_list=0x7fff5fbff048) at /deps/3rdparty/zookeeper/source/src/zookeeper.c:260
> #1  0x000000010004f500 in free_completions (zh=0x1003022f0, callCompletion=1, reason=-116)
at /deps/3rdparty/zookeeper/source/src/zookeeper.c:1219
> #2  0x0000000100057bfd in cleanup_bufs (zh=0x1003022f0, callCompletion=1, rc=-116) at
> #3  0x000000010004ee42 in destroy (zh=0x1003022f0) at /deps/3rdparty/zookeeper/source/src/zookeeper.c:393
> #4  0x000000010004eaf3 in zookeeper_init (host=0x1006005b0 "xxxinvalidhostname:2181",
watcher=0x100007670 <xxx::zk_watch_cb(_zhandle*, int, int, char const*, void*)>, 
>     recv_timeout=10000, clientid=0x0, context=0x100600350, flags=0) at /deps/3rdparty/zookeeper/source/src/zookeeper.c:877

This message was sent by Atlassian JIRA

View raw message