httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Querna <c...@force-elite.com>
Subject Re: shutdown and linux poll()
Date Mon, 13 Feb 2006 22:15:45 GMT
To clarify, are you sure its not using EPoll instead of Poll?


Chris Darroch wrote:
> Hi --
> 
>    This may be an old topic of conversation, in which case I apologize.
> I Googled and searched marc.theaimslist.com and Apache Bugzilla but
> didn't see anything, so here I am with a question.
> 
>    In brief, on Linux, when doing an ungraceful stop of httpd, any
>  worker threads that are poll()ing on Keep-Alive connections don't get
> awoken by close_worker_sockets() and that can lead to the process
> getting the SIGKILL signal without ever getting the chance to run
> apr_pool_destroy(pchild) in clean_child_exit().  This seems to
> relate to this particular choice by the Linux and/or glibc folks:
> 
> http://bugme.osdl.org/show_bug.cgi?id=546
> 
> 
>    The backstory goes like this: I spent a chunk of last week trying
> to figure out why my module wasn't shutting down properly.  First I
> found some places in my code where I'd failed to anticipate the order
> in which memory pool cleanup functions would be called, especially
> those registered by apr_thread_cond_create().
> 
>    However, after fixing that, I found that when connections were still
> in the 15 second timeout for Keep-Alives, a child process could get the
> SIGKILL before finished cleaning up.  (I'm using httpd 2.2.0 with the
> worker MPM on Linux 2.6.9 [RHEL 4] with APR 1.2.2.)  The worker threads
> are poll()ing and, if I'm reading my strace files correctly, they don't
> get an EBADF until after the timeout completes.  That means that
> join_workers() is waiting for those threads to exit, so child_main()
> can't finish up and call clean_child_exit() and thus apr_pool_destroy()
> on the pchild memory pool.
> 
>    This is a bit of a problem for me because I really need
> join_workers() to finish up and the cleanups I've registered
> against pchild in my module's child_init handler to be run if
> at all possible.
> 
>    It was while researching all this that I stumbled on the amazing
> new graceful-stop feature and submitted #38621, which I see has
> already been merged ... thank you!
> 
>    However, if I need to do an ungraceful stop of the server --
> either manually or because the GracefulShutdownTimeout has
> expired without a chance to gracefully stop -- I'd still like my
> cleanups to run.
> 
> 
>    My solution at the moment is a pure hack -- I threw in
> apr_sleep(apr_time_from_sec(15)) right before
> ap_reclaim_child_processes(1) in ap_mpm_run() in worker.c.
> That way it lets all the Keep-Alive timeouts expire before
> applying the SIGTERM/SIGKILL hammer.  But that doesn't seem
> ideal, and moreover, doesn't take into account the fact that
> KeepAliveTimeouts > 15 seconds may have been assigned.  Even
> if I expand my hack to wait for the maximum possible Keep-Alive
> timeout, it's still clearly a hack.
> 
> 
>    Does anyone have any advice?  Does this seem like a problem
> to be addressed?  I tried to think through how one could signal
> the poll()ing worker threads with pthread_kill(), but it seems
> to me that not only would you have to have a signal handler
> in the worker threads (not hard), you'd somehow have to break
> out of whatever APR wrappers are abstracting the poll() once
> the handler set its flag or whatever and returned -- the APR
> functions can't just loop on EINTR anymore.  (Is it
> socket_bucket_read() in the socket bucket code and then
> apr_socket_recv()?  I can't quite tell yet.)  Anyway, it seemed
> complex and likely to break the abstraction across OSes.
> 
>    Still, I imagine I'm not the only one who would really like
> those worker threads to cleanly exit so everything else does ...
> after all, they're not doing anything critical, just waiting
> for the Keep-Alive timeout to expire, after which they notice
> their socket is borked and exit.
> 
>    FWIW, I tested httpd 2.2.0 with the worker MPM on a Solaris
> 2.9 box and it does indeed do what the Linux "bug" report says;
> poll() returns immediately if another thread closes the socket
> and thus the whole httpd server exits right away.
> 
>    Thoughts, advice?  Any comments appreciated.
> 
> Chris.
> 


Mime
View raw message