httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Lu <chaos...@gmail.com>
Subject Re: Question about async mod_proxy_wstunnel and threads
Date Fri, 25 Jul 2014 17:31:22 GMT
Re: Patch; yes, I think that works as well, but I'm still not sure it makes
sense. If two or more sockets are readable, the first one that's processed
will remove itself and the other ones from the pollset; the rest will try
to remove themselves from the pollset, but nothing will happen (since
they've already been removed by the processing of the first one).


On Fri, Jul 25, 2014 at 11:50 AM, Ruediger Pluem <rpluem@apache.org> wrote:

> Does the following patch work as well?
>
> Index: event.c
> ===================================================================
> --- event.c     (revision 1613354)
> +++ event.c     (working copy)
> @@ -1973,9 +1973,12 @@
>                          apr_pollset_remove(event_pollset, baton->pfds[i]);
>                      }
>
> +                    apr_pollset_remove(event_pollset, out_pfd);
>                      push_timer2worker(te);
>                  }
> -                apr_pollset_remove(event_pollset, out_pfd);
> +                else {
> +                    apr_pollset_remove(event_pollset, out_pfd);
> +                }
>              }
>              out_pfd++;
>              num--;
>
> Regards
>
> RĂ¼diger
>
> Edward Lu wrote:
> > I think I've found the issue; there's an extraneous apr_pollset_remove()
> call in event.c:1978. Relevant code
> > (double-slash comments are mine):
> >
> >     /* We only signal once per N sockets with this baton */
> >     if (!(baton->signaled)) {
> >         baton->signaled = 1;
> >         te = event_get_timer_event(-1 /* fake timer */,
> >                                    baton->cbfunc,
> >                                    baton->user_baton,
> >                                    0, /* don't insert it */
> >                                    NULL /* no associated socket callback
> */);
> >         /* remove other sockets in my set */
> >         // says remove "other" sockets, but it really removes all
> >         for (i = 0; i < baton->nsock; i++) {
> >             rc = apr_pollset_remove(event_pollset, baton->pfds[i]);
> >         }
> >
> >         push_timer2worker(te);
> >     }
> >     // This causes the problem; if the sockets are processed and re-added
> >     //  before this is called, the backend socket gets re-removed, never
> to
> >     //  be seen again (until the client sends something)
> >     apr_pollset_remove(event_pollset, out_pfd);
> >
> > Seems like the solution is just to remove the extra
> apr_pollset_remove(); all sockets are removed from the pollset when
> > one of them is signalled. If another one was signalled in the same poll,
> it will enter this block and simply do nothing
> > (because of the baton->signaled check). I gave it a quick test and it
> looked like the problem was solved.
> >
> >
> > On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu <chaosed0@gmail.com <mailto:
> chaosed0@gmail.com>> wrote:
> >
> >     I got it to break with these python scripts, as well as with my own
> test client and server (browser javascript and
> >     node.js). Strange though, I can't reproduce it with only 2 clients
> on either of those programs; I have to get 3
> >     clients connected at once. Past that, I see exactly the behavior you
> describe.
> >
> >     I agree that it seems to be some sort of timing issue - I've been
> trying to put trace statements into the HTTPD to
> >     see what information I can get, but once I do that it starts
> behaving as expected. Still investigating...
> >
> >
> >     On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep <
> Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>> wrote:
> >
> >         Another tidbit of information ...
> >
> >         I attempted to see what was going on by running httpd in the
> foreground with strace. When I did that, the tests
> >         worked perfectly. I guess this suggests it is some sort of
> timing issue. Running httpd in the foreground without
> >         trace exhibited the problem.
> >
> >         Here's the output of strace during the test. I sent the string
> "hhhhhhh" from client 1 and it was echoed back by
> >         the server to both clients, as expected.
> >
> >         [pid 15451] <... futex resumed> )       = 0
> >         [pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0
> >         [pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752,
> u64=144017752}}}, 2, 0) = 1
> >         [pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10
> >         [pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0
> >         [pid 15451] read(12, 0xb6a39aa8, 8000)  = -1 EAGAIN (Resource
> temporarily unavailable)
> >         [pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10
> >         [pid 15451] read(12, 0xa96084b0, 8000)  = -1 EAGAIN (Resource
> temporarily unavailable)
> >         [pid 15451] epoll_wait(13, {}, 2, 0)    = 0
> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN,
> {u32=144044472, u64=144044472}}) = 0
> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN,
> {u32=144044496, u64=144044496}}) = 0
> >         [pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0
> >         [pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856,
> u64=2843761856}}}, 2, 0) = 1
> >         [pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10
> >         [pid 15451] read(9, 0xb6a39aa8, 8000)   = -1 EAGAIN (Resource
> temporarily unavailable)
> >         [pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10
> >         [pid 15451] read(9, 0xa96084b0, 8000)   = -1 EAGAIN (Resource
> temporarily unavailable)
> >         [pid 15451] epoll_wait(10, {}, 2, 0)    = 0
> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN,
> {u32=2843796920, u64=2843796920}}) = 0
> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN,
> {u32=2843796944, u64=2843796944}}) = 0
> >         [pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0
> >         [pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL
> <unfinished ...>
> >
> >
> >         - Steve
> >
> >         -----Original Message-----
> >         From: Steve Zweep
> >         Sent: Monday, July 21, 2014 11:56 AM
> >         To: 'dev@httpd.apache.org <mailto:dev@httpd.apache.org>'
> >         Subject: RE: Question about async mod_proxy_wstunnel and threads
> >
> >         My client and server are based on sample code included with the
> ws4py package
> >         (http://ws4py.readthedocs.org/en/latest/). We use CherryPy and
> the ws4py package contains a CherryPy-based
> >         server (attached).
> >
> >         The client is a variation of the echo_client.py sample code also
> included in the ws4py package.
> >
> >         In terms of failure rate, I'd say it is about 90%. Often I start
> 2 clients and initially it works OK, but then
> >         stops after a message or two. Once it has failed, I can send any
> number of messages from the one client and
> >         continue to see the server echo them back, but no message is
> received by the second. When I then send a message
> >         from the second client I see a flood of messages from the server
> that had been queued.
> >
> >         - Steve
> >
> >
> >         -----Original Message-----
> >         From: Eric Covener [mailto:covener@gmail.com <mailto:
> covener@gmail.com>]
> >         Sent: Saturday, July 19, 2014 8:16 AM
> >         To: Apache HTTP Server Development List
> >         Subject: Re: Question about async mod_proxy_wstunnel and threads
> >
> >         I guess it is kind of telling that we could still respond to the
> write from client 2, there are probably more
> >         potential errors that make us lose track of both sides
> simultaneously.
> >
> >         Can you share your ws client and server if they're easy to use?
> My little sample doesn't spray that way.
> >
> >         I'm also curious approximately how often it fails this way.
> >
> >         On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic <
> ylavic.dev@gmail.com <mailto:ylavic.dev@gmail.com>> wrote:
> >         > Probably a false trail, trunk at r1611741 is up to date, I
> think 1.5.x
> >         > won't help.
> >         >
> >         > On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep
> >         > <Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>>
> wrote:
> >         >> Hi Yann,
> >         >>
> >         >> The test I ran today was built with APR from trunk
> (r1611741). I see that r1605769 modifies apr_skiplist.c
> >         and there has been a lot of activity in the  trunk version of
> that code lately. I can try with the 1.5.x branch
> >         code to see if it makes a difference. Probably won't get to this
> before Monday though.
> >         >>
> >         >> - Steve
> >         >>
> >         >> -----Original Message-----
> >         >> From: Yann Ylavic [mailto:ylavic.dev@gmail.com <mailto:
> ylavic.dev@gmail.com>]
> >         >> Sent: Friday, July 18, 2014 4:51 PM
> >         >> To: httpd
> >         >> Subject: Re: Question about async mod_proxy_wstunnel and
> threads
> >         >>
> >         >> Hi Steve,
> >         >>
> >         >> can you still reproduce with the latest APR 1.5.x, notably
> containing this fix: http://svn.apache.org/r1605769.
> >         >> I don't think there is a released version with this patch...
> >         >>
> >         >> Regards,
> >         >> Yann.
> >         >>
> >         >> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep <
> Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>>
> >         wrote:
> >         >>> I've attached annotated logs that show the issues I
> described. Both scenarios have ProxyWebsocketAsync
> >         turned on. The first does not use the AsyncDelay and shows how
> server messages stall and are not delivered until
> >         the client polls. The second has ProxyWebsocketAsyncDelay set to
> 100. In that case, message processing works
> >         properly, but threads are held open and there is no sign of
> async processing.
> >         >>>
> >         >>> Since my build and execution environment were somewhat
> non-standard, I repeated all the tests today on stock
> >         Ubuntu 14.04, with a fresh checkout and build of httpd and apr
> trunk code. The same results were observed.
> >         >>>
> >         >>> - Steve
> >         >>>
> >         >>>
> >         >>>
> >         >>> -----Original Message-----
> >         >>> ________________________________________
> >         >>> From: Eric Covener [covener@gmail.com <mailto:
> covener@gmail.com>]
> >         >>> Sent: July 17, 2014 9:15 PM
> >         >>> To: Apache HTTP Server Development List
> >         >>> Subject: Re: Question about async mod_proxy_wstunnel and
> threads
> >         >>>
> >         >>> I am having trouble seeing it mis-behave. w/ Async and
> AsyncDelay, I
> >         >>> am seeing the expected trace messages and when I look at
> backtraces
> >         >>> of httpd I can see zero threads in wstunnel . If I send a
> server
> >         >>> msg, I get it ASAP in the client -- and then I see 1 thread
> in poll
> >         >>> for the right couple of seconds
> >         >>>
> >         >>> Can you grab trace at e.g.
> >         >>>
> >         >>> LogLevel INFO proxy_wstunnel_module:trace8
> >         >>>
> >         >>> And annotate the timing a bit for what you do in the client?
>  Is it possible you have an un-updated trunk
> >         from several weeks ago?  There was an optimization put in and
> backed out that might have broke some of these
> >         same things for a very short window.
> >         >>>
> >
> >
> >
> >         --
> >         Eric Covener
> >         covener@gmail.com <mailto:covener@gmail.com>
> >
> >
> >
>

Mime
View raw message