tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Wang <aw...@ptc.com>
Subject Re: Possible race condition with mod_jk + multiple workers in recovery mode
Date Tue, 11 Jan 2011 23:59:50 GMT
I'm not sure, but it looks like the service() function in jk_lb_worker.c
calls puts a recovering worker into the JK_LB_STATE_PROBE state and then
doesn't set it to JK_LB_STATE_OK until after the end->service() call.

I think this allows a second thread to come in, and since
JK_WORKER_USABLE() returns false because of the JK_LB_STATE_PROBE state
it never tries to use that worker and the second request thread
completes, then the first request completes and finally marks the worker
as JK_LB_STATE_OK.

Still working on a reproducible state to debug this in, but does this
sound like a possible problem or am I mis-reading what the
end->service() call does here:
                service_stat = end->service(end, s, l, &is_service_error);

Thanks,
Andy

On 01/11/2011 12:08 PM, Andy Wang wrote:
> I'm still digging, but I thought I'd send this along to the mailing list
> while I try to decipher the mod_jk code.
>
> We're using tomcat-connector 1.2.31 on apache 2.2.17 on a Linux system.
>
> Log file is at this URL:
> http://www.moonteeth.com/~dopey/tomcat/mod_jk.log
> <http://www.moonteeth.com/%7Edopey/tomcat/mod_jk.log>
>
> The worker configuration consists of a load balanced worker with 9
> workers (tomcat1-9).  At any given time, usually only one tomcat is in
> use.  In the case of the log, all the workers are in recovery state
> (apache started before tomcat, and someone hit the page so the tomcats
> are all down).
>
> In the log file the requests
> 5356:1146444096
> /Windchill/servlet/WindchillAuthGW/wt.httpgw.HTTPAuthentication/login
> and
> 5357:1138932032 /Windchill/servlet/WindchillGW/wt.httpgw.HTTPServer/ping
> come in almost simultaneously.   5357:1138932032 completes first and
> picks up tomcat1, recovers it and uses it.
>
> However 5356:1146444096 never tries worker 1.  5357:1138932032 grabbed
> worker 1 before 5356:1146444096 does, and by the time  5356:1146444096
> gets the worker list, it never bothers to try tomcat1, just tries allaf
> the other workers.
>
> As I said, I'm still digging at the mod_jk code to try to find out
> what's going on, but hoping that maybe someone else has also seen this
> problem.
>
> I can't reproduce this on my system unfortunately, and it's quite
> intermittent.
>
> Andy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message