On 07.09.2010 12:40, Rainer Jung wrote:
> On 07.09.2010 12:32, John Baker wrote:
>> Rainer,
>>
>> On Tuesday 07 September 2010 11:09:46 you wrote:
>>> I don't like "socket_timeout" ...
>>>
>>>> worker.basic.socket_timeout=90
>>>
>>> but I would like socket_connect_timeout.
>>>
>>> The next two are possibly a bit short, because if the backend e.g. does
>>> a Java Garbage Collection which miht take longer than 1 second, tose
>>> timeouts will fire and take the node out of load balancing.
>>>
>>>> worker.basic.connect_timeout=1000
>>>> worker.basic.prepost_timeout=1000
>>
>> I've change socket_timeout=90 to socket_connect_timeout=9000, and
>> connect_timeout/prepost_timeout to 5000, and the problem persists:
>>
>> [Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
>> jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
>> [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
>> jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0
>> lingering bytes in 2 sec.
>> [Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug]
>> wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
>
> OK, so it would be helpful to see a more complete log, e.g.
>
> grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]'
>
> If there's data in it which is not meant for public posting, you can
> also send it to me va direct mail.
Thanks for the log snippets. I can now see:
- the socket close *does* happen during maintain
- this wasn't obvious from the previous log snippert, because more
"Maintain" lines for other workers were left out
Some relevant lines are:
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-...
...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
ajp_reset_endpoint::jk_ajp_common.c (757): (worker1) resetting endpoint
with sd = 152 (socket shutdown)
...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0
lingering bytes in 2 sec.
...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
ajp_maintain::jk_ajp_common.c (3190): cleaning pool slot=0 elapsed 185 in 2
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
ajp_maintain::jk_ajp_common.c (3197): reached pool min size 1 from 1
cache slots
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
ajp_maintain::jk_ajp_common.c (3237): recycled 1 sockets in 2 seconds
from 1 pool slots
...
So now we can discuss, how to fix this.
- Can we detect more reliably, whether we want to drain before closing
or not
- As a workaround: using a JkWatchdog moves the maintain into a separate
thread. But during the socket closing a lock is held, which blocks other
threads from accessing the same worker.
We would need to rewrite the logic in ajp_maintain() to only hold the
lock (critical section) while pulling sockets out of the ep_cache or
entering them back, not while actually using the sockets.
I still don't know, *why* the socket close blocks, i.e. why the backend
doesn't respond to us wanting to close the socket. Here a network sniff
like indicated by Mladen would help.
Regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
|