hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Darroch (JIRA)" <j...@apache.org>
Subject [jira] Updated: (ZOOKEEPER-320) call auth completion in free_completions()
Date Tue, 17 Feb 2009 22:52:59 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Darroch updated ZOOKEEPER-320:
------------------------------------

    Attachment: ZOOKEEPER-320-319.patch

This patch includes auth data locking as per ZOOKEEPER-319.

> call auth completion in free_completions()
> ------------------------------------------
>
>                 Key: ZOOKEEPER-320
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>    Affects Versions: 3.0.0, 3.0.1, 3.1.0
>            Reporter: Chris Darroch
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320.patch
>
>
> If a client calls zoo_add_auth() with an invalid scheme (e.g., "foo") the ZooKeeper server
will mark their session expired and close the connection.  However, the C client has returned
immediately after queuing the new auth data to be sent with a ZOK return code.
> If the client then waits for their auth completion function to be called, they can wait
forever, as no session event is ever delivered to that completion function.  All other completion
functions are notified of session events by free_completions(), which is called by cleanup_bufs()
in handle_error() in handle_socket_error_msg().
> In actual fact, what can happen (about 50% of the time, for me) is that the next call
by the IO thread to flush_send_queue() calls send() from within send_buffer(), and receives
a SIGPIPE signal during this send() call.  Because the ZooKeeper C API is a library, it properly
does not catch that signal.  If the user's code is not catching that signal either, they experience
an abort caused by an untrapped signal.  If they are ignoring the signal -- which is common
in context I'm working in, the Apache httpd server -- then flush_send_queue()'s error return
code is EPIPE, which is logged by handle_socket_error_msg(), and all non-auth completion functions
are notified of a session event.  However, if the caller is waiting for their auth completion
function, they wait forever while the IO thread tries repeatedly to reconnect and is rejected
by the server as having an expired session.
> So, first of all, it would be useful to document in the C API portion of the programmer's
guide that trapping or ignoring SIGPIPE is important, as this signal may be generated by the
C API.
> Next, the two attached patches call the auth completion function, if any, in free_completions(),
which fixes this problem for me.  The second attached patch includes auth lock/unlock function,
as per ZOOKEEPER-319.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message