Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Message-ID: <53D27CE0.3050004@apache.org>
Date: Fri, 25 Jul 2014 17:50:56 +0200
From: Ruediger Pluem <rpluem@apache.org>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64;
 rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26.1
MIME-Version: 1.0
To: dev@httpd.apache.org
Subject: Re: Question about async mod_proxy_wstunnel and threads
References: <56C2E06F97EACF4FB2F1B31EEFC4493091FA8B62@mbx2.wgti.net>
	<CALK=YjPh75EuCBm6vMbAO0kzJWH0EEoteuCx3rJW+toQ_POP=w@mail.gmail.com>
	<56C2E06F97EACF4FB2F1B31EEFC4493091FA9160@mbx2.wgti.net>
	<56C2E06F97EACF4FB2F1B31EEFC4493091FA9A51@mbx2.wgti.net>
	<CAKQ1sVMe8G39C11zfZv1FFcVXUFBrO13yE5JXj_wrfcAnG6cRQ@mail.gmail.com>
	<56C2E06F97EACF4FB2F1B31EEFC4493091FA9C6B@mbx2.wgti.net>
	<CAKQ1sVPqsoFsRHd+OEJV5b2untghWG_8TVjR9pUFo62k+U6K7A@mail.gmail.com>
	<CALK=YjM6iF8U1e10y3whCfzfb7sXRoa8EC6Bg6siwY=8rWds_w@mail.gmail.com>
	<56C2E06F97EACF4FB2F1B31EEFC4493091FAC70B@mbx2.wgti.net>
	<CAL7zx_fvExq-mWftMO_Ft7mxNe1OEyLLyWfPs1P8GsUKSHxghw@mail.gmail.com>
 <CAL7zx_eBXAAyhDjg5ej6qiKLR1FH+DUvizNsvm8jUcmAb9D_0g@mail.gmail.com>
In-Reply-To: 
 <CAL7zx_eBXAAyhDjg5ej6qiKLR1FH+DUvizNsvm8jUcmAb9D_0g@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Does the following patch work as well?

Index: event.c
===================================================================
--- event.c     (revision 1613354)
+++ event.c     (working copy)
@@ -1973,9 +1973,12 @@
                         apr_pollset_remove(event_pollset, baton->pfds[i]);
                     }

+                    apr_pollset_remove(event_pollset, out_pfd);
                     push_timer2worker(te);
                 }
-                apr_pollset_remove(event_pollset, out_pfd);
+                else {
+                    apr_pollset_remove(event_pollset, out_pfd);
+                }
             }
             out_pfd++;
             num--;

Regards

Rüdiger

Edward Lu wrote:
> I think I've found the issue; there's an extraneous apr_pollset_remove() call in event.c:1978. Relevant code
> (double-slash comments are mine):
> 
>     /* We only signal once per N sockets with this baton */
>     if (!(baton->signaled)) {
>         baton->signaled = 1;
>         te = event_get_timer_event(-1 /* fake timer */,
>                                    baton->cbfunc,
>                                    baton->user_baton,
>                                    0, /* don't insert it */
>                                    NULL /* no associated socket callback */);
>         /* remove other sockets in my set */
>         // says remove "other" sockets, but it really removes all
>         for (i = 0; i < baton->nsock; i++) {
>             rc = apr_pollset_remove(event_pollset, baton->pfds[i]);
>         }
> 
>         push_timer2worker(te);
>     }
>     // This causes the problem; if the sockets are processed and re-added
>     //  before this is called, the backend socket gets re-removed, never to
>     //  be seen again (until the client sends something)
>     apr_pollset_remove(event_pollset, out_pfd);
> 
> Seems like the solution is just to remove the extra apr_pollset_remove(); all sockets are removed from the pollset when
> one of them is signalled. If another one was signalled in the same poll, it will enter this block and simply do nothing
> (because of the baton->signaled check). I gave it a quick test and it looked like the problem was solved.
> 
> 
> On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu <chaosed0@gmail.com <mailto:chaosed0@gmail.com>> wrote:
> 
>     I got it to break with these python scripts, as well as with my own test client and server (browser javascript and
>     node.js). Strange though, I can't reproduce it with only 2 clients on either of those programs; I have to get 3
>     clients connected at once. Past that, I see exactly the behavior you describe.
> 
>     I agree that it seems to be some sort of timing issue - I've been trying to put trace statements into the HTTPD to
>     see what information I can get, but once I do that it starts behaving as expected. Still investigating...
> 
> 
>     On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep <Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>> wrote:
> 
>         Another tidbit of information ...
> 
>         I attempted to see what was going on by running httpd in the foreground with strace. When I did that, the tests
>         worked perfectly. I guess this suggests it is some sort of timing issue. Running httpd in the foreground without
>         trace exhibited the problem.
> 
>         Here's the output of strace during the test. I sent the string "hhhhhhh" from client 1 and it was echoed back by
>         the server to both clients, as expected.
> 
>         [pid 15451] <... futex resumed> )       = 0
>         [pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0
>         [pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752, u64=144017752}}}, 2, 0) = 1
>         [pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10
>         [pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0
>         [pid 15451] read(12, 0xb6a39aa8, 8000)  = -1 EAGAIN (Resource temporarily unavailable)
>         [pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10
>         [pid 15451] read(12, 0xa96084b0, 8000)  = -1 EAGAIN (Resource temporarily unavailable)
>         [pid 15451] epoll_wait(13, {}, 2, 0)    = 0
>         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=144044472, u64=144044472}}) = 0
>         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=144044496, u64=144044496}}) = 0
>         [pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0
>         [pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856, u64=2843761856}}}, 2, 0) = 1
>         [pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10
>         [pid 15451] read(9, 0xb6a39aa8, 8000)   = -1 EAGAIN (Resource temporarily unavailable)
>         [pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10
>         [pid 15451] read(9, 0xa96084b0, 8000)   = -1 EAGAIN (Resource temporarily unavailable)
>         [pid 15451] epoll_wait(10, {}, 2, 0)    = 0
>         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=2843796920, u64=2843796920}}) = 0
>         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=2843796944, u64=2843796944}}) = 0
>         [pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0
>         [pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL <unfinished ...>
> 
> 
>         - Steve
> 
>         -----Original Message-----
>         From: Steve Zweep
>         Sent: Monday, July 21, 2014 11:56 AM
>         To: 'dev@httpd.apache.org <mailto:dev@httpd.apache.org>'
>         Subject: RE: Question about async mod_proxy_wstunnel and threads
> 
>         My client and server are based on sample code included with the ws4py package
>         (http://ws4py.readthedocs.org/en/latest/). We use CherryPy and the ws4py package contains a CherryPy-based
>         server (attached).
> 
>         The client is a variation of the echo_client.py sample code also included in the ws4py package.
> 
>         In terms of failure rate, I'd say it is about 90%. Often I start 2 clients and initially it works OK, but then
>         stops after a message or two. Once it has failed, I can send any number of messages from the one client and
>         continue to see the server echo them back, but no message is received by the second. When I then send a message
>         from the second client I see a flood of messages from the server that had been queued.
> 
>         - Steve
> 
> 
>         -----Original Message-----
>         From: Eric Covener [mailto:covener@gmail.com <mailto:covener@gmail.com>]
>         Sent: Saturday, July 19, 2014 8:16 AM
>         To: Apache HTTP Server Development List
>         Subject: Re: Question about async mod_proxy_wstunnel and threads
> 
>         I guess it is kind of telling that we could still respond to the write from client 2, there are probably more
>         potential errors that make us lose track of both sides simultaneously.
> 
>         Can you share your ws client and server if they're easy to use? My little sample doesn't spray that way.
> 
>         I'm also curious approximately how often it fails this way.
> 
>         On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic <ylavic.dev@gmail.com <mailto:ylavic.dev@gmail.com>> wrote:
>         > Probably a false trail, trunk at r1611741 is up to date, I think 1.5.x
>         > won't help.
>         >
>         > On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep
>         > <Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>> wrote:
>         >> Hi Yann,
>         >>
>         >> The test I ran today was built with APR from trunk (r1611741). I see that r1605769 modifies apr_skiplist.c
>         and there has been a lot of activity in the  trunk version of that code lately. I can try with the 1.5.x branch
>         code to see if it makes a difference. Probably won't get to this before Monday though.
>         >>
>         >> - Steve
>         >>
>         >> -----Original Message-----
>         >> From: Yann Ylavic [mailto:ylavic.dev@gmail.com <mailto:ylavic.dev@gmail.com>]
>         >> Sent: Friday, July 18, 2014 4:51 PM
>         >> To: httpd
>         >> Subject: Re: Question about async mod_proxy_wstunnel and threads
>         >>
>         >> Hi Steve,
>         >>
>         >> can you still reproduce with the latest APR 1.5.x, notably containing this fix: http://svn.apache.org/r1605769.
>         >> I don't think there is a released version with this patch...
>         >>
>         >> Regards,
>         >> Yann.
>         >>
>         >> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep <Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>>
>         wrote:
>         >>> I've attached annotated logs that show the issues I described. Both scenarios have ProxyWebsocketAsync
>         turned on. The first does not use the AsyncDelay and shows how server messages stall and are not delivered until
>         the client polls. The second has ProxyWebsocketAsyncDelay set to 100. In that case, message processing works
>         properly, but threads are held open and there is no sign of async processing.
>         >>>
>         >>> Since my build and execution environment were somewhat non-standard, I repeated all the tests today on stock
>         Ubuntu 14.04, with a fresh checkout and build of httpd and apr trunk code. The same results were observed.
>         >>>
>         >>> - Steve
>         >>>
>         >>>
>         >>>
>         >>> -----Original Message-----
>         >>> ________________________________________
>         >>> From: Eric Covener [covener@gmail.com <mailto:covener@gmail.com>]
>         >>> Sent: July 17, 2014 9:15 PM
>         >>> To: Apache HTTP Server Development List
>         >>> Subject: Re: Question about async mod_proxy_wstunnel and threads
>         >>>
>         >>> I am having trouble seeing it mis-behave. w/ Async and AsyncDelay, I
>         >>> am seeing the expected trace messages and when I look at backtraces
>         >>> of httpd I can see zero threads in wstunnel . If I send a server
>         >>> msg, I get it ASAP in the client -- and then I see 1 thread in poll
>         >>> for the right couple of seconds
>         >>>
>         >>> Can you grab trace at e.g.
>         >>>
>         >>> LogLevel INFO proxy_wstunnel_module:trace8
>         >>>
>         >>> And annotate the timing a bit for what you do in the client?  Is it possible you have an un-updated trunk
>         from several weeks ago?  There was an optimization put in and backed out that might have broke some of these
>         same things for a very short window.
>         >>>
> 
> 
> 
>         --
>         Eric Covener
>         covener@gmail.com <mailto:covener@gmail.com>
> 
> 
>