httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Zweep <Steve.Zw...@watchguard.com>
Subject RE: Question about async mod_proxy_wstunnel and threads
Date Fri, 25 Jul 2014 16:01:00 GMT
Cool. I will test this out.

One other issue I found was with the setting of ProxyWebsocketAsyncDelay. The docs say that
this value is in milliseconds, but the code actually parses this as seconds. While testing,
what I thought was a 100 ms delay was actually 100 seconds and I wasn’t waiting long enough
to see the connections suspend. The code that parses this is here:

Index: mod_proxy_wstunnel.c
===================================================================
--- mod_proxy_wstunnel.c        (revision 1611738)
+++ mod_proxy_wstunnel.c        (working copy)
@@ -548,7 +548,7 @@
static const char * proxyws_set_aysnch_delay(cmd_parms *cmd, void *conf, const char *val)
{
     proxyws_dir_conf *dconf = conf;
-    if (ap_timeout_parameter_parse(val, &(dconf->async_delay), "s") != APR_SUCCESS)
+    if (ap_timeout_parameter_parse(val, &(dconf->async_delay), "ms") != APR_SUCCESS)
         return "ProxyWebsocketAsyncDelay timeout has wrong format";
     return NULL;
}

The same issue looks like it applies to the ProxyWebsocketIdleTimeout setting as well.

- Steve

From: Edward Lu [mailto:chaosed0@gmail.com]
Sent: Friday, July 25, 2014 11:38 AM
To: dev@httpd.apache.org
Subject: Re: Question about async mod_proxy_wstunnel and threads

I think I've found the issue; there's an extraneous apr_pollset_remove() call in event.c:1978.
Relevant code (double-slash comments are mine):

    /* We only signal once per N sockets with this baton */
    if (!(baton->signaled)) {
        baton->signaled = 1;
        te = event_get_timer_event(-1 /* fake timer */,
                                   baton->cbfunc,
                                   baton->user_baton,
                                   0, /* don't insert it */
                                   NULL /* no associated socket callback */);
        /* remove other sockets in my set */
        // says remove "other" sockets, but it really removes all
        for (i = 0; i < baton->nsock; i++) {
            rc = apr_pollset_remove(event_pollset, baton->pfds[i]);
        }

        push_timer2worker(te);
    }
    // This causes the problem; if the sockets are processed and re-added
    //  before this is called, the backend socket gets re-removed, never to
    //  be seen again (until the client sends something)
    apr_pollset_remove(event_pollset, out_pfd);

Seems like the solution is just to remove the extra apr_pollset_remove(); all sockets are
removed from the pollset when one of them is signalled. If another one was signalled in the
same poll, it will enter this block and simply do nothing (because of the baton->signaled
check). I gave it a quick test and it looked like the problem was solved.

On Tue, Jul 22, 2014 at 10:46 AM, Edward Lu <chaosed0@gmail.com<mailto:chaosed0@gmail.com>>
wrote:
I got it to break with these python scripts, as well as with my own test client and server
(browser javascript and node.js). Strange though, I can't reproduce it with only 2 clients
on either of those programs; I have to get 3 clients connected at once. Past that, I see exactly
the behavior you describe.

I agree that it seems to be some sort of timing issue - I've been trying to put trace statements
into the HTTPD to see what information I can get, but once I do that it starts behaving as
expected. Still investigating...

On Tue, Jul 22, 2014 at 10:40 AM, Steve Zweep <Steve.Zweep@watchguard.com<mailto:Steve.Zweep@watchguard.com>>
wrote:
Another tidbit of information ...

I attempted to see what was going on by running httpd in the foreground with strace. When
I did that, the tests worked perfectly. I guess this suggests it is some sort of timing issue.
Running httpd in the foreground without trace exhibited the problem.

Here's the output of strace during the test. I sent the string "hhhhhhh" from client 1 and
it was echoed back by the server to both clients, as expected.

[pid 15451] <... futex resumed> )       = 0
[pid 15451] futex(0x891ec54, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 15451] epoll_wait(13, {{EPOLLIN, {u32=144017752, u64=144017752}}}, 2, 0) = 1
[pid 15451] read(12, "\201\10hhhhhhhh", 8000) = 10
[pid 15451] mprotect(0xb6a3a000, 8192, PROT_READ|PROT_WRITE) = 0
[pid 15451] read(12, 0xb6a39aa8, 8000)  = -1 EAGAIN (Resource temporarily unavailable)
[pid 15451] writev(11, [{"\201\10hhhhhhhh", 10}], 1) = 10
[pid 15451] read(12, 0xa96084b0, 8000)  = -1 EAGAIN (Resource temporarily unavailable)
[pid 15451] epoll_wait(13, {}, 2, 0)    = 0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=144044472, u64=144044472}}) = 0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=144044496, u64=144044496}}) = 0
[pid 15451] gettimeofday({1406038173, 745644}, NULL) = 0
[pid 15451] epoll_wait(10, {{EPOLLIN, {u32=2843761856, u64=2843761856}}}, 2, 0) = 1
[pid 15451] read(9, "\201\10hhhhhhhh", 8000) = 10
[pid 15451] read(9, 0xb6a39aa8, 8000)   = -1 EAGAIN (Resource temporarily unavailable)
[pid 15451] writev(8, [{"\201\10hhhhhhhh", 10}], 1) = 10
[pid 15451] read(9, 0xa96084b0, 8000)   = -1 EAGAIN (Resource temporarily unavailable)
[pid 15451] epoll_wait(10, {}, 2, 0)    = 0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=2843796920, u64=2843796920}}) =
0
[pid 15451] epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=2843796944, u64=2843796944}}) =
0
[pid 15451] gettimeofday({1406038173, 745832}, NULL) = 0
[pid 15451] futex(0x891ec88, FUTEX_WAIT_PRIVATE, 38, NULL <unfinished ...>


- Steve

-----Original Message-----
From: Steve Zweep
Sent: Monday, July 21, 2014 11:56 AM
To: 'dev@httpd.apache.org<mailto:dev@httpd.apache.org>'
Subject: RE: Question about async mod_proxy_wstunnel and threads

My client and server are based on sample code included with the ws4py package (http://ws4py.readthedocs.org/en/latest/).
We use CherryPy and the ws4py package contains a CherryPy-based server (attached).

The client is a variation of the echo_client.py sample code also included in the ws4py package.

In terms of failure rate, I'd say it is about 90%. Often I start 2 clients and initially it
works OK, but then stops after a message or two. Once it has failed, I can send any number
of messages from the one client and continue to see the server echo them back, but no message
is received by the second. When I then send a message from the second client I see a flood
of messages from the server that had been queued.
- Steve


-----Original Message-----
From: Eric Covener [mailto:covener@gmail.com<mailto:covener@gmail.com>]
Sent: Saturday, July 19, 2014 8:16 AM
To: Apache HTTP Server Development List
Subject: Re: Question about async mod_proxy_wstunnel and threads

I guess it is kind of telling that we could still respond to the write from client 2, there
are probably more potential errors that make us lose track of both sides simultaneously.

Can you share your ws client and server if they're easy to use? My little sample doesn't spray
that way.

I'm also curious approximately how often it fails this way.

On Fri, Jul 18, 2014 at 5:50 PM, Yann Ylavic <ylavic.dev@gmail.com<mailto:ylavic.dev@gmail.com>>
wrote:
> Probably a false trail, trunk at r1611741 is up to date, I think 1.5.x
> won't help.
>
> On Fri, Jul 18, 2014 at 11:35 PM, Steve Zweep
> <Steve.Zweep@watchguard.com<mailto:Steve.Zweep@watchguard.com>> wrote:
>> Hi Yann,
>>
>> The test I ran today was built with APR from trunk (r1611741). I see that r1605769
modifies apr_skiplist.c and there has been a lot of activity in the  trunk version of that
code lately. I can try with the 1.5.x branch code to see if it makes a difference. Probably
won't get to this before Monday though.
>>
>> - Steve
>>
>> -----Original Message-----
>> From: Yann Ylavic [mailto:ylavic.dev@gmail.com<mailto:ylavic.dev@gmail.com>]
>> Sent: Friday, July 18, 2014 4:51 PM
>> To: httpd
>> Subject: Re: Question about async mod_proxy_wstunnel and threads
>>
>> Hi Steve,
>>
>> can you still reproduce with the latest APR 1.5.x, notably containing this fix: http://svn.apache.org/r1605769.
>> I don't think there is a released version with this patch...
>>
>> Regards,
>> Yann.
>>
>> On Fri, Jul 18, 2014 at 9:38 PM, Steve Zweep <Steve.Zweep@watchguard.com<mailto:Steve.Zweep@watchguard.com>>
wrote:
>>> I've attached annotated logs that show the issues I described. Both scenarios
have ProxyWebsocketAsync turned on. The first does not use the AsyncDelay and shows how server
messages stall and are not delivered until the client polls. The second has ProxyWebsocketAsyncDelay
set to 100. In that case, message processing works properly, but threads are held open and
there is no sign of async processing.
>>>
>>> Since my build and execution environment were somewhat non-standard, I repeated
all the tests today on stock Ubuntu 14.04, with a fresh checkout and build of httpd and apr
trunk code. The same results were observed.
>>>
>>> - Steve
>>>
>>>
>>>
>>> -----Original Message-----
>>> ________________________________________
>>> From: Eric Covener [covener@gmail.com<mailto:covener@gmail.com>]
>>> Sent: July 17, 2014 9:15 PM
>>> To: Apache HTTP Server Development List
>>> Subject: Re: Question about async mod_proxy_wstunnel and threads
>>>
>>> I am having trouble seeing it mis-behave. w/ Async and AsyncDelay, I
>>> am seeing the expected trace messages and when I look at backtraces
>>> of httpd I can see zero threads in wstunnel . If I send a server
>>> msg, I get it ASAP in the client -- and then I see 1 thread in poll
>>> for the right couple of seconds
>>>
>>> Can you grab trace at e.g.
>>>
>>> LogLevel INFO proxy_wstunnel_module:trace8
>>>
>>> And annotate the timing a bit for what you do in the client?  Is it possible
you have an un-updated trunk from several weeks ago?  There was an optimization put in and
backed out that might have broke some of these same things for a very short window.
>>>



--
Eric Covener
covener@gmail.com<mailto:covener@gmail.com>


Mime
View raw message