Return-Path: X-Original-To: apmail-httpd-dev-archive@www.apache.org Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE73111F44 for ; Fri, 25 Jul 2014 15:51:03 +0000 (UTC) Received: (qmail 76323 invoked by uid 500); 25 Jul 2014 15:51:03 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 76255 invoked by uid 500); 25 Jul 2014 15:51:03 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 76242 invoked by uid 99); 25 Jul 2014 15:51:03 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2014 15:51:03 +0000 Received: by minotaur.apache.org (Postfix, from userid 2161) id E661211F43; Fri, 25 Jul 2014 15:51:02 +0000 (UTC) Received: from [IPv6:::1] (localhost [IPv6:::1]) by gauss.localdomain (Postfix) with ESMTP id 234134D6 for ; Fri, 25 Jul 2014 17:50:57 +0200 (CEST) Message-ID: <53D27CE0.3050004@apache.org> Date: Fri, 25 Jul 2014 17:50:56 +0200 From: Ruediger Pluem User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26.1 MIME-Version: 1.0 To: dev@httpd.apache.org Subject: Re: Question about async mod_proxy_wstunnel and threads References: <56C2E06F97EACF4FB2F1B31EEFC4493091FA8B62@mbx2.wgti.net> <56C2E06F97EACF4FB2F1B31EEFC4493091FA9160@mbx2.wgti.net> <56C2E06F97EACF4FB2F1B31EEFC4493091FA9A51@mbx2.wgti.net> <56C2E06F97EACF4FB2F1B31EEFC4493091FA9C6B@mbx2.wgti.net> <56C2E06F97EACF4FB2F1B31EEFC4493091FAC70B@mbx2.wgti.net> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Does the following patch work as well? Index: event.c =================================================================== --- event.c (revision 1613354) +++ event.c (working copy) @@ -1973,9 +1973,12 @@ apr_pollset_remove(event_pollset, baton->pfds[i]); } + apr_pollset_remove(event_pollset, out_pfd); push_timer2worker(te); } - apr_pollset_remove(event_pollset, out_pfd); + else { + apr_pollset_remove(event_pollset, out_pfd); + } } out_pfd++; num--; Regards RĂ¼diger Edward Lu wrote: > I think I've found the issue; there's an extraneous apr_pollset_remove() call in event.c:1978. Relevant code > (double-slash comments are mine): > > /* We only signal once per N sockets with this baton */ > if (!(baton->signaled)) { > baton->signaled = 1; > te = event_get_timer_event(-1 /* fake timer */, > baton->cbfunc, > baton->user_baton, > 0, /* don't insert it */ > NULL /* no associated socket callback */); > /* remove other sockets in my set */ > // says remove "other" sockets, but it really removes all > for (i = 0; i < baton->nsock; i++) { > rc = apr_pollset_remove(event_pollset, baton->pfds[i]); > } > > push_timer2worker(te); > } > // This causes the problem; if the sockets are processed and re-added > // before this is called, the backend socket gets re-removed, never to > // be seen again (until the client sends something) > apr_pollset_remove(event_pollset, out_pfd); > > Seems like the solution is just to remove the extra apr_pollset_remove(); all sockets are removed from the pollset when > one of them is signalled. If another one was signalled in the same poll, it will enter this block and simply do nothing > (because of the baton->signaled check). I gave it a quick test and it looked like the problem was solved. > > > On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu > wrote: > > I got it to break with these python scripts, as well as with my own test client and server (browser javascript and > node.js). Strange though, I can't reproduce it with only 2 clients on either of those programs; I have to get 3 > clients connected at once. Past that, I see exactly the behavior you describe. > > I agree that it seems to be some sort of timing issue - I've been trying to put trace statements into the HTTPD to > see what information I can get, but once I do that it starts behaving as expected. Still investigating... > > > On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep > wrote: > > Another tidbit of information ... > > I attempted to see what was going on by running httpd in the foreground with strace. When I did that, the tests > worked perfectly. I guess this suggests it is some sort of timing issue. Running httpd in the foreground without > trace exhibited the problem. > > Here's the output of strace during the test. I sent the string "hhhhhhh" from client 1 and it was echoed back by > the server to both clients, as expected. > > [pid 15451] <... futex resumed> ) = 0 > [pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0 > [pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752, u64=144017752}}}, 2, 0) = 1 > [pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10 > [pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0 > [pid 15451] read(12, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource temporarily unavailable) > [pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10 > [pid 15451] read(12, 0xa96084b0, 8000) = -1 EAGAIN (Resource temporarily unavailable) > [pid 15451] epoll_wait(13, {}, 2, 0) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=144044472, u64=144044472}}) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=144044496, u64=144044496}}) = 0 > [pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0 > [pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856, u64=2843761856}}}, 2, 0) = 1 > [pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10 > [pid 15451] read(9, 0xb6a39aa8, 8000) = -1 EAGAIN (Resource temporarily unavailable) > [pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10 > [pid 15451] read(9, 0xa96084b0, 8000) = -1 EAGAIN (Resource temporarily unavailable) > [pid 15451] epoll_wait(10, {}, 2, 0) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=2843796920, u64=2843796920}}) = 0 > [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=2843796944, u64=2843796944}}) = 0 > [pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0 > [pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL > > > - Steve > > -----Original Message----- > From: Steve Zweep > Sent: Monday, July 21, 2014 11:56 AM > To: 'dev@httpd.apache.org ' > Subject: RE: Question about async mod_proxy_wstunnel and threads > > My client and server are based on sample code included with the ws4py package > (http://ws4py.readthedocs.org/en/latest/). We use CherryPy and the ws4py package contains a CherryPy-based > server (attached). > > The client is a variation of the echo_client.py sample code also included in the ws4py package. > > In terms of failure rate, I'd say it is about 90%. Often I start 2 clients and initially it works OK, but then > stops after a message or two. Once it has failed, I can send any number of messages from the one client and > continue to see the server echo them back, but no message is received by the second. When I then send a message > from the second client I see a flood of messages from the server that had been queued. > > - Steve > > > -----Original Message----- > From: Eric Covener [mailto:covener@gmail.com ] > Sent: Saturday, July 19, 2014 8:16 AM > To: Apache HTTP Server Development List > Subject: Re: Question about async mod_proxy_wstunnel and threads > > I guess it is kind of telling that we could still respond to the write from client 2, there are probably more > potential errors that make us lose track of both sides simultaneously. > > Can you share your ws client and server if they're easy to use? My little sample doesn't spray that way. > > I'm also curious approximately how often it fails this way. > > On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic > wrote: > > Probably a false trail, trunk at r1611741 is up to date, I think 1.5.x > > won't help. > > > > On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep > > > wrote: > >> Hi Yann, > >> > >> The test I ran today was built with APR from trunk (r1611741). I see that r1605769 modifies apr_skiplist.c > and there has been a lot of activity in the trunk version of that code lately. I can try with the 1.5.x branch > code to see if it makes a difference. Probably won't get to this before Monday though. > >> > >> - Steve > >> > >> -----Original Message----- > >> From: Yann Ylavic [mailto:ylavic.dev@gmail.com ] > >> Sent: Friday, July 18, 2014 4:51 PM > >> To: httpd > >> Subject: Re: Question about async mod_proxy_wstunnel and threads > >> > >> Hi Steve, > >> > >> can you still reproduce with the latest APR 1.5.x, notably containing this fix: http://svn.apache.org/r1605769. > >> I don't think there is a released version with this patch... > >> > >> Regards, > >> Yann. > >> > >> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep > > wrote: > >>> I've attached annotated logs that show the issues I described. Both scenarios have ProxyWebsocketAsync > turned on. The first does not use the AsyncDelay and shows how server messages stall and are not delivered until > the client polls. The second has ProxyWebsocketAsyncDelay set to 100. In that case, message processing works > properly, but threads are held open and there is no sign of async processing. > >>> > >>> Since my build and execution environment were somewhat non-standard, I repeated all the tests today on stock > Ubuntu 14.04, with a fresh checkout and build of httpd and apr trunk code. The same results were observed. > >>> > >>> - Steve > >>> > >>> > >>> > >>> -----Original Message----- > >>> ________________________________________ > >>> From: Eric Covener [covener@gmail.com ] > >>> Sent: July 17, 2014 9:15 PM > >>> To: Apache HTTP Server Development List > >>> Subject: Re: Question about async mod_proxy_wstunnel and threads > >>> > >>> I am having trouble seeing it mis-behave. w/ Async and AsyncDelay, I > >>> am seeing the expected trace messages and when I look at backtraces > >>> of httpd I can see zero threads in wstunnel . If I send a server > >>> msg, I get it ASAP in the client -- and then I see 1 thread in poll > >>> for the right couple of seconds > >>> > >>> Can you grab trace at e.g. > >>> > >>> LogLevel INFO proxy_wstunnel_module:trace8 > >>> > >>> And annotate the timing a bit for what you do in the client? Is it possible you have an un-updated trunk > from several weeks ago? There was an optimization put in and backed out that might have broke some of these > same things for a very short window. > >>> > > > > -- > Eric Covener > covener@gmail.com > > >