httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Lu <chaos...@gmail.com>
Subject Re: Question about async mod_proxy_wstunnel and threads
Date Fri, 25 Jul 2014 17:36:34 GMT
Previous comment was only on the else block; I just noticed the extra
apr_pollset_remove() here:

                         apr_pollset_remove(event_pollset, baton->pfds[i]);
                     }

+                    apr_pollset_remove(event_pollset, out_pfd);
                     push_timer2worker(te);
                 }

It shouldn't be necessary either, since out_pfd is included in baton->pfds;
it gets removed in the for loop just before.


On Fri, Jul 25, 2014 at 1:31 PM, Edward Lu <chaosed0@gmail.com> wrote:

> Re: Patch; yes, I think that works as well, but I'm still not sure it
> makes sense. If two or more sockets are readable, the first one that's
> processed will remove itself and the other ones from the pollset; the rest
> will try to remove themselves from the pollset, but nothing will happen
> (since they've already been removed by the processing of the first one).
>
>
> On Fri, Jul 25, 2014 at 11:50 AM, Ruediger Pluem <rpluem@apache.org>
> wrote:
>
>> Does the following patch work as well?
>>
>> Index: event.c
>> ===================================================================
>> --- event.c     (revision 1613354)
>> +++ event.c     (working copy)
>> @@ -1973,9 +1973,12 @@
>>                          apr_pollset_remove(event_pollset,
>> baton->pfds[i]);
>>                      }
>>
>> +                    apr_pollset_remove(event_pollset, out_pfd);
>>                      push_timer2worker(te);
>>                  }
>> -                apr_pollset_remove(event_pollset, out_pfd);
>> +                else {
>> +                    apr_pollset_remove(event_pollset, out_pfd);
>> +                }
>>              }
>>              out_pfd++;
>>              num--;
>>
>> Regards
>>
>> RĂ¼diger
>>
>> Edward Lu wrote:
>> > I think I've found the issue; there's an extraneous
>> apr_pollset_remove() call in event.c:1978. Relevant code
>> > (double-slash comments are mine):
>> >
>> >     /* We only signal once per N sockets with this baton */
>> >     if (!(baton->signaled)) {
>> >         baton->signaled = 1;
>> >         te = event_get_timer_event(-1 /* fake timer */,
>> >                                    baton->cbfunc,
>> >                                    baton->user_baton,
>> >                                    0, /* don't insert it */
>> >                                    NULL /* no associated socket
>> callback */);
>> >         /* remove other sockets in my set */
>> >         // says remove "other" sockets, but it really removes all
>> >         for (i = 0; i < baton->nsock; i++) {
>> >             rc = apr_pollset_remove(event_pollset, baton->pfds[i]);
>> >         }
>> >
>> >         push_timer2worker(te);
>> >     }
>> >     // This causes the problem; if the sockets are processed and
>> re-added
>> >     //  before this is called, the backend socket gets re-removed,
>> never to
>> >     //  be seen again (until the client sends something)
>> >     apr_pollset_remove(event_pollset, out_pfd);
>> >
>> > Seems like the solution is just to remove the extra
>> apr_pollset_remove(); all sockets are removed from the pollset when
>> > one of them is signalled. If another one was signalled in the same
>> poll, it will enter this block and simply do nothing
>> > (because of the baton->signaled check). I gave it a quick test and it
>> looked like the problem was solved.
>> >
>> >
>> > On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu <chaosed0@gmail.com
>> <mailto:chaosed0@gmail.com>> wrote:
>> >
>> >     I got it to break with these python scripts, as well as with my own
>> test client and server (browser javascript and
>> >     node.js). Strange though, I can't reproduce it with only 2 clients
>> on either of those programs; I have to get 3
>> >     clients connected at once. Past that, I see exactly the behavior
>> you describe.
>> >
>> >     I agree that it seems to be some sort of timing issue - I've been
>> trying to put trace statements into the HTTPD to
>> >     see what information I can get, but once I do that it starts
>> behaving as expected. Still investigating...
>> >
>> >
>> >     On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep <
>> Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>> wrote:
>> >
>> >         Another tidbit of information ...
>> >
>> >         I attempted to see what was going on by running httpd in the
>> foreground with strace. When I did that, the tests
>> >         worked perfectly. I guess this suggests it is some sort of
>> timing issue. Running httpd in the foreground without
>> >         trace exhibited the problem.
>> >
>> >         Here's the output of strace during the test. I sent the string
>> "hhhhhhh" from client 1 and it was echoed back by
>> >         the server to both clients, as expected.
>> >
>> >         [pid 15451] <... futex resumed> )       = 0
>> >         [pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0
>> >         [pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752,
>> u64=144017752}}}, 2, 0) = 1
>> >         [pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10
>> >         [pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0
>> >         [pid 15451] read(12, 0xb6a39aa8, 8000)  = -1 EAGAIN (Resource
>> temporarily unavailable)
>> >         [pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10
>> >         [pid 15451] read(12, 0xa96084b0, 8000)  = -1 EAGAIN (Resource
>> temporarily unavailable)
>> >         [pid 15451] epoll_wait(13, {}, 2, 0)    = 0
>> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN,
>> {u32=144044472, u64=144044472}}) = 0
>> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN,
>> {u32=144044496, u64=144044496}}) = 0
>> >         [pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0
>> >         [pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856,
>> u64=2843761856}}}, 2, 0) = 1
>> >         [pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10
>> >         [pid 15451] read(9, 0xb6a39aa8, 8000)   = -1 EAGAIN (Resource
>> temporarily unavailable)
>> >         [pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10
>> >         [pid 15451] read(9, 0xa96084b0, 8000)   = -1 EAGAIN (Resource
>> temporarily unavailable)
>> >         [pid 15451] epoll_wait(10, {}, 2, 0)    = 0
>> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN,
>> {u32=2843796920, u64=2843796920}}) = 0
>> >         [pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN,
>> {u32=2843796944, u64=2843796944}}) = 0
>> >         [pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0
>> >         [pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL
>> <unfinished ...>
>> >
>> >
>> >         - Steve
>> >
>> >         -----Original Message-----
>> >         From: Steve Zweep
>> >         Sent: Monday, July 21, 2014 11:56 AM
>> >         To: 'dev@httpd.apache.org <mailto:dev@httpd.apache.org>'
>> >         Subject: RE: Question about async mod_proxy_wstunnel and threads
>> >
>> >         My client and server are based on sample code included with the
>> ws4py package
>> >         (http://ws4py.readthedocs.org/en/latest/). We use CherryPy and
>> the ws4py package contains a CherryPy-based
>> >         server (attached).
>> >
>> >         The client is a variation of the echo_client.py sample code
>> also included in the ws4py package.
>> >
>> >         In terms of failure rate, I'd say it is about 90%. Often I
>> start 2 clients and initially it works OK, but then
>> >         stops after a message or two. Once it has failed, I can send
>> any number of messages from the one client and
>> >         continue to see the server echo them back, but no message is
>> received by the second. When I then send a message
>> >         from the second client I see a flood of messages from the
>> server that had been queued.
>> >
>> >         - Steve
>> >
>> >
>> >         -----Original Message-----
>> >         From: Eric Covener [mailto:covener@gmail.com <mailto:
>> covener@gmail.com>]
>> >         Sent: Saturday, July 19, 2014 8:16 AM
>> >         To: Apache HTTP Server Development List
>> >         Subject: Re: Question about async mod_proxy_wstunnel and threads
>> >
>> >         I guess it is kind of telling that we could still respond to
>> the write from client 2, there are probably more
>> >         potential errors that make us lose track of both sides
>> simultaneously.
>> >
>> >         Can you share your ws client and server if they're easy to use?
>> My little sample doesn't spray that way.
>> >
>> >         I'm also curious approximately how often it fails this way.
>> >
>> >         On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic <
>> ylavic.dev@gmail.com <mailto:ylavic.dev@gmail.com>> wrote:
>> >         > Probably a false trail, trunk at r1611741 is up to date, I
>> think 1.5.x
>> >         > won't help.
>> >         >
>> >         > On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep
>> >         > <Steve.Zweep@watchguard.com <mailto:
>> Steve.Zweep@watchguard.com>> wrote:
>> >         >> Hi Yann,
>> >         >>
>> >         >> The test I ran today was built with APR from trunk
>> (r1611741). I see that r1605769 modifies apr_skiplist.c
>> >         and there has been a lot of activity in the  trunk version of
>> that code lately. I can try with the 1.5.x branch
>> >         code to see if it makes a difference. Probably won't get to
>> this before Monday though.
>> >         >>
>> >         >> - Steve
>> >         >>
>> >         >> -----Original Message-----
>> >         >> From: Yann Ylavic [mailto:ylavic.dev@gmail.com <mailto:
>> ylavic.dev@gmail.com>]
>> >         >> Sent: Friday, July 18, 2014 4:51 PM
>> >         >> To: httpd
>> >         >> Subject: Re: Question about async mod_proxy_wstunnel and
>> threads
>> >         >>
>> >         >> Hi Steve,
>> >         >>
>> >         >> can you still reproduce with the latest APR 1.5.x, notably
>> containing this fix: http://svn.apache.org/r1605769.
>> >         >> I don't think there is a released version with this patch...
>> >         >>
>> >         >> Regards,
>> >         >> Yann.
>> >         >>
>> >         >> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep <
>> Steve.Zweep@watchguard.com <mailto:Steve.Zweep@watchguard.com>>
>> >         wrote:
>> >         >>> I've attached annotated logs that show the issues I
>> described. Both scenarios have ProxyWebsocketAsync
>> >         turned on. The first does not use the AsyncDelay and shows how
>> server messages stall and are not delivered until
>> >         the client polls. The second has ProxyWebsocketAsyncDelay set
>> to 100. In that case, message processing works
>> >         properly, but threads are held open and there is no sign of
>> async processing.
>> >         >>>
>> >         >>> Since my build and execution environment were somewhat
>> non-standard, I repeated all the tests today on stock
>> >         Ubuntu 14.04, with a fresh checkout and build of httpd and apr
>> trunk code. The same results were observed.
>> >         >>>
>> >         >>> - Steve
>> >         >>>
>> >         >>>
>> >         >>>
>> >         >>> -----Original Message-----
>> >         >>> ________________________________________
>> >         >>> From: Eric Covener [covener@gmail.com <mailto:
>> covener@gmail.com>]
>> >         >>> Sent: July 17, 2014 9:15 PM
>> >         >>> To: Apache HTTP Server Development List
>> >         >>> Subject: Re: Question about async mod_proxy_wstunnel and
>> threads
>> >         >>>
>> >         >>> I am having trouble seeing it mis-behave. w/ Async and
>> AsyncDelay, I
>> >         >>> am seeing the expected trace messages and when I look at
>> backtraces
>> >         >>> of httpd I can see zero threads in wstunnel . If I send
a
>> server
>> >         >>> msg, I get it ASAP in the client -- and then I see 1 thread
>> in poll
>> >         >>> for the right couple of seconds
>> >         >>>
>> >         >>> Can you grab trace at e.g.
>> >         >>>
>> >         >>> LogLevel INFO proxy_wstunnel_module:trace8
>> >         >>>
>> >         >>> And annotate the timing a bit for what you do in the
>> client?  Is it possible you have an un-updated trunk
>> >         from several weeks ago?  There was an optimization put in and
>> backed out that might have broke some of these
>> >         same things for a very short window.
>> >         >>>
>> >
>> >
>> >
>> >         --
>> >         Eric Covener
>> >         covener@gmail.com <mailto:covener@gmail.com>
>> >
>> >
>> >
>>
>
>

Mime
View raw message