tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hans Schmid" <Hans.Sch...@einsurance.de>
Subject AW: jk 1.2.4 LB bug?
Date Wed, 16 Jul 2003 13:15:04 GMT
Thanks for your reply,
comments see inline

> -----Ursprungliche Nachricht-----
> Von: Glenn Nielsen [mailto:glenn@mail.more.net]
> Gesendet: Mittwoch, 16. Juli 2003 12:26
> An: Tomcat Developers List
> Betreff: Re: jk 1.2.4 LB bug?
>
>
> mod_jk will print out information about the lb config if you set
> the JkLogLevel to debug.
>

done

> I would suggest setting up a test system where you can test
> the below with JkLogLevel debug configured.  Then grep the
> log for lines which have jk_lb_worker.c in them.
>

OK

> This will tell you several things.
>
> 1. Whether the worker.properties are getting reread when you
>    do an apache restart.  (They should be)
>

Yes they were reread:
Initial:
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
jk_worker_t::destroy up to 1 endpoint to close
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (605)]: In
jk_endpoint_t::ajp_close_endpoint
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (612)]: In
jk_endpoint_t::ajp_close_endpoint, closed sd = 12
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
jk_worker_t::destroy up to 1 endpoint to close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
jk_worker_t::destroy up to 1 endpoint to close
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy

... closing other not related worker

[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (172)]: Into
jk_uri_worker_map_t::uri_worker_map_alloc
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (375)]: Into
jk_uri_worker_map_t::uri_worker_map_open
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (396)]:
jk_uri_worker_map_t::uri_worker_map_open, rule map size is 12
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (321)]: Into
jk_uri_worker_map_t::uri_worker_map_open, match rule
/einsurance/=lb-einsurance was added
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (345)]: Into
jk_uri_worker_map_t::uri_worker_map_open, exact rule
/einsurance=lb-einsurance was added

... 5 other workers (including other lb-workers and normal workers)

added
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (408)]: Into
jk_uri_worker_map_t::uri_worker_map_open, there are 12 rules
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (422)]:
jk_uri_worker_map_t::uri_worker_map_open, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (88)]: Into wc_open
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (222)]: Into build_worker_map,
creating 6 workers
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (228)]: build_worker_map, creating
worker lb-einsurance
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]: wc_create_worker, about to
create instance lb-einsurance of lb
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (586)]: Into lb_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]: wc_create_worker, about to
validate and init lb-einsurance
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (420)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]: wc_create_worker, about to
create instance ajp13-01 of ajp13
[Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
ajp13_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]: wc_create_worker, about to
validate and init ajp13-01
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
jk_worker_t::validate for worker ajp13-01 contact is tomcathost-ei:11009
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into jk_worker_t::init
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In jk_worker_t::init,
setting socket timeout to 0
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]: wc_create_worker, about to
create instance ajp13-02 of ajp13
[Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
ajp13_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]: wc_create_worker, about to
validate and init ajp13-02
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
jk_worker_t::validate for worker ajp13-02 contact is tomcathost-ei:11019
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into jk_worker_t::init
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In jk_worker_t::init,
setting socket timeout to 0
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]: wc_create_worker, about to
create instance ajp13-sb of ajp13
[Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
ajp13_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]: wc_create_worker, about to
validate and init ajp13-sb
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
jk_worker_t::validate for worker ajp13-sb contact is tomcathost-ei-sb:11015
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into jk_worker_t::init
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In jk_worker_t::init,
setting socket timeout to 0
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
name ajp13-01
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
name ajp13-sb
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
name ajp13-02
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
true
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (505)]: local_worker_only: false
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (238)]: build_worker_map, removing
old lb-einsurance worker

this last line looks suspicous to me

> 2. What the lb worker thinks the config is.
>

initial:
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (586)]: Into lb_worker_factory
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (420)]: Into
jk_worker_t::validate
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
name ajp13-01
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
name ajp13-sb
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
name ajp13-02
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
true
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (505)]: local_worker_only: false

but after the switching and graceful restart exactly the same (which is the
error) !!!!!!!!

[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (420)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
name ajp13-01
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
name ajp13-sb
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
name ajp13-02
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
true
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (505)]: local_worker_only: false

This explains the observed (wrong) fall-over behavior, (should be ajp13-02,
ajp13-sb, ajp13-01)


original workers.properties:
worker.ajp13-01.lbfactor=1
worker.ajp13-01.local_worker=1

worker.ajp13-02.lbfactor=1
worker.ajp13-02.local_worker=0

worker.ajp13-sb.lbfactor=0
worker.ajp13-sb.local_worker=1

local_worker_only=0 for the lb-worker

changed to before graceful restart: (linking a different worker.properties)

worker.ajp13-01.lbfactor=1
worker.ajp13-01.local_worker=0

worker.ajp13-02.lbfactor=1
worker.ajp13-02.local_worker=1

worker.ajp13-sb.lbfactor=0
worker.ajp13-sb.local_worker=1

local_worker_only=0 the lb-worker



So it *seems* there might be something wrong with the reinitialisation of
the worker order ?


If you need further information, I can mail you the complete logs offline.

Thanks for looking into this,
Hans



> Then post those log lines here.
>
> Thanks,
>
> Glenn
>
> Hans Schmid wrote:
> > Hi,
> >
> > I noticed the following with mod_jk 1.2.4, Apache 1.3.26 and
> > Tomcat 3.3.1a on Solaris 8 JDK 1.4.1_03.
> >
> > Maybe a LB bug (Loadfactors do not recover after startup of new
> > tomcat/graceful Apache restart).
> >
> > Let me explain my scenario first:
> >
> > I want to gracefully upgrade our webapp without loosing
> sessions + have a
> > fail over scenario.
> > Therefor we have sticky sessions enabled.
> >
> > Setup:
> > 1 tomcat 01 running on Server A,
> > 0 tomcat 02 running on Server A,
> > 1 tomcat SB running on Server B
> >
> > 01 tomcat on Server A runs the application, SB tomcat on server B is
> > standby(fallback),
> > 02 tomcat is shutdown on Server A at the moment.
> >
> > All three Tomcats are in the same lb_worker:
> >
> >
> > worker.list=lb-worker
> >
> > worker.ajp13-01.port=11009
> > worker.ajp13-01.host=A
> > worker.ajp13-01.type=ajp13
> > worker.ajp13-01.lbfactor=1
> > worker.ajp13-01.local_worker=1
> >
> > worker.ajp13-02.port=11019
> > worker.ajp13-02.host=A
> > worker.ajp13-02.type=ajp13
> > worker.ajp13-02.lbfactor=1
> > worker.ajp13-02.local_worker=0
> >
> > worker.ajp13-sb.port=11015
> > worker.ajp13-sb.host=B
> > worker.ajp13-sb.type=ajp13
> > worker.ajp13-sb.lbfactor=0
> > worker.ajp13-sb.local_worker=1
> >
> > worker.lb-worker.type=lb
> > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb
> > worker.lb-worker.local_worker_only=0
> >
> >
> > The worker List order should now be:
> >  1. worker.ajp13-01 lbfactor=1,local_worker=1  TC 01
> >  2. worker.ajp13-sb lbfactor=0,local_worker=1  TC SB
> >  3. worker.ajp13-02 lbfactor=1,local_worker=0) TC 02  (not running)
> >
> > Now all requests go to worker.ajp13-01 (TC 01), none to
> worker.ajp13-sb (TC
> > SB lbfactor=0),
> > none to worker.ajp13-02.port (TC 02 not running).
> >
> > If Server A crashes (TC 01) all new requests go to Server B (TC SB
> > worker.ajp13-sb)
> > since this is then the only running Tomcat FINE
> > This is our Fail-Over Solution (lost running sessions, but OK).
> >
> > Now the webapp update Scenario:
> >
> > 1.) shutdown TC SB on Server B, update the webapp, start tc SB
> and test via
> > a seperate HTTPConnector port without Apache.
> > 2.) this does not affect anything on production, since the
> lbfactor=0 on TC
> > SB
> > -> no sessions arrive on tc SB
> > 3.) When the test was successful, our Standby Tomcat SB is updated
> > 4.) Now upgrade the webapp on Server A TC 02, which is currently not
> > running.
> > 5.) Start up TC 02 on Server A with the new version of the webapp,
> > immediately exchange the worker.properties with a different version and
> > gracefully restart apache:
> >
> > worker.list=lb-worker
> >
> > worker.ajp13-01.port=11009
> > worker.ajp13-01.host=A
> > worker.ajp13-01.type=ajp13
> > worker.ajp13-01.lbfactor=1
> > worker.ajp13-01.local_worker=0     <---- put old webapp on TC 01 to the
> > foreign worker list
> >
> > worker.ajp13-02.port=11019
> > worker.ajp13-02.host=A
> > worker.ajp13-02.type=ajp13
> > worker.ajp13-02.lbfactor=1
> > worker.ajp13-02.local_worker=1     <---- put new webapp on TC
> 02 in front of
> > the local worker list
> >
> > worker.ajp13-sb.port=11015
> > worker.ajp13-sb.host=B
> > worker.ajp13-sb.type=ajp13
> > worker.ajp13-sb.lbfactor=0
> > worker.ajp13-sb.local_worker=1
> >
> > worker.lb-worker.type=lb
> > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb
> > worker.lb-worker.local_worker_only=0
> >
> > Just the two lines marked above with <---- swap
> > (local_worker values of TC 01 and TC 02)
> >
> > 6.) now all 3 Tomcats are running. All existing sessions still
> go to TC 01
> > (sticky sessions; we do not loose running sessions)
> > 7.) What I expect:
> > TC 02 takes a while to startup.
> > The worker List order should now be:
> >  1. worker.ajp13-02 lbfactor=1,local_worker=1  TC 02
> >  2. worker.ajp13-sb lbfactor=0,local_worker=1  TC SB
> >  3. worker.ajp13-01 lbfactor=1,local_worker=0) TC 01  (old webapp)
> >
> > Since TC 02 needs 3 minutes to start up (filling caches etc.) it is not
> > immediately availlable.
> > During this time new sessions arrive at TC SB, since it is the
> next in the
> > worker list. OK fine this works.
> > Since these sessions are sticky as well, all users connecting
> during this
> > time stay on TC SB
> > during their whole session life. FINE
> >
> > 8.) As soon as TC 02 is up and running (finished all
> load-on-startup servlet
> > initialisition stuff)
> > I would expect that TC 02 gets all new Sessions (Number 1 in the worker
> > List).
> >
> > This is not the case! All new Sessions still arrive at TC SB.
> >
> > 9.) After a while (one hour) we shutdown TC 01. Since no new sessions
> > arrived there since our
> > graceful restart of Apache, all old Sessions should have expired.
> >
> > 10.) even now (only 2 Tomcats running TC 02  and TC SB) and even after a
> > graceful restart new Sessions
> > arrive at TC SB
> >
> >
> > Conclusion:
> > Now, do I misunderstand the supposed behaviour of lbfactor and
> local_worker
> > flag ?
> > I think that the behaviour in 8.) is wrong. 10.) is starange too.
> >
> > Thanks for any suggestion if I am completely wrong here
> > or further looking into this.
> >
> > Hans
> >
> >
> >
> >>-----Ursprungliche Nachricht-----
> >>Von: Glenn Nielsen [mailto:glenn@mail.more.net]
> >>Gesendet: Mittwoch, 9. Juli 2003 15:56
> >>An: Tomcat Developers List
> >>Betreff: Re: jk 1.2.25 release ?
> >>
> >>
> >>I was hoping to get it released this week.
> >>
> >>But I just noticed that under Apache 2 mod_jk piped logs there
> >>are two instances of the piped log program running for the same
> >>log file.  I want to track this down.
> >>
> >>I also just implemented load balancing this morning on a production
> >>server.  I noticed that when none of the workers for the load balancer
> >>were available an HTTP status code of 200 was being logged in mod_jk.log
> >>when request logging was enabled. So I want to look into this also.
> >>
> >>Hopefully now that I have load balancing in place with 2 tomcat servers
> >>instead of 1 the Missouri Lottery web site I administer will scale to
> >>handle the big spike in load tonight for the $240 PowerBall jackpot. :-)
> >>
> >>Regards,
> >>
> >>Glenn
> >>
> >>Henri Gomez wrote:
> >>
> >>>Any date ?
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> >>>
> >>
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> >>
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Mime
View raw message