tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jean-frederic clere <>
Subject Re: Feature request /Discussion: JK loadbalancer improvements for high load
Date Thu, 05 Jul 2007 08:20:14 GMT
Henri Gomez wrote:
> Something we should also check is the CPU load of the Tomcat instance.
> May be it will be usefull also to let users/admin add their own
> counters in the load estimation.

If you want to add this to Tomcat remeber that stuff needs a JNI module 
to collect information from the OS/hardware and that is an OS depend code.



> For example, if some admins considers we should base the
> load-balancing on HTTP requests or SQL access, and they have these
> counters on their webapp applications, it will be usefull to be able
> to get them from Tomcat to send them back to jk balancer.
> It shouldn't be too hard and very welcome for many Tomcat sites
> 2007/7/4, Rainer Jung <>:
>> Hi,
>> implementing a management communication between the lb and the backend
>> is on the roadmap for jk3. It is somehow unlikely, that this will help
>> in your situation, because when doing a GC the jvm will no longer
>> respond to the management channel. Traditional Mark Sweep Compact GC is
>> not distinguishable from the outside from a halt in the backend. Of
>> course we could think of a webapp trying to use the JMX info on memory
>> consumption to estimate GC activity in advance, but I doubt that this
>> will be a stable solution. There are notifications, when GCs happen, but
>> at the moment I'm not sure, if such events exist, before, or only after
>> a GC.
>> I think a first step (and a better solution) would be to use modern GC
>> algorithms like Concurrent Mark Sweep, which will most of the time
>> reduce the GC stop times to some 10s or 100s of milliseconds (depending
>> on heap size). CMS comes with a cost, a little more memory needed and a
>> little more CPU needed, but the dramatically decreased stop times are
>> worth it. Also it is quite robust since about 1-2 years.
>> Other components will not like long GC pauses as well, like for instance
>> cluster replication. There you configure the longest pause you accept
>> for missing heartbeat packets before assuming a node is dead. Assuming a
>> node being dead because of GC pauses and then the node suddenly works
>> without having noticed itself that it outer world has changed is a very
>> bad situation too.
>> What we plan as a first step for jk3 is putting mod_jk on the basis of
>> the apache APR libraries. Then we can relatively easily use our own
>> management threads to monitor the backend status and influence the
>> balancing decisions. As long as we do everything on top of the request
>> handling threads we can't do complex things in a stable way.
>> Getting jk3 out of the door will take some longer time (maybe 6 to 12
>> months'for a release). People willing to help are welcome.
>> Concerning the SLAs: it always makes sense to put a percentage limit on
>> the maximum response times and error rates. A 100% below some limit
>> clause will always be to expensive. But of course, if you can't reduce
>> GC times and the GC runs to often, there will be no acceptable
>> percentage for long running requests.
>> Thank you for sharing your experiences at Langen with us!
>> Regards,
>> Rainer
>> Yefym Dmukh wrote:
>> > Hi all,
>> > sorry for the stress but it seems that it is a time to come back to  
>> the
>> > discussion related to the load balancing for JVM (Tomcat).
>> >
>> > Prehistory:
>> > Recently we made benchmark and smoke tests of our product at the sun 
>> high
>> > tech centre in Langen (Germany).
>> >
>> > As the webserver apache2.2.4 has been used, container -10xTomcat 5.5.25
>> > and as load balancer - JK connector 1.2.23 with busyness algorithm.
>> >
>> >         Under the high load the strange behaviour was  observed: some
>> > tomcat workers temporary got the non-proportional load, often 10 times
>> > higher then the others for the relatively long periods.  As the 
>> result the
>> > response times that usually stay under 500ms went up to 20+ sec, 
>> that in
>> > its turn  made the overall test results almost two time worst as
>> > estimated.
>> >
>> >                 At the beginning we were quite confused, because we 
>> were
>> > sure that it was not the problem of JVM configuration and supposed that
>> > the reason is in LB logic of mod_jk, and the both suggestions were 
>> right.
>> >
>> > Actually the following was happening: the LB sends requests and gets 
>> the
>> > session sticky, continuously sending the upcoming requests to the same
>> > cluster node. At the certain period of time the JVM started the major
>> > garbage collection (full gc) and spent, mentioned above, 20 seconds. At
>> > the same time jk continued to send new requests and the sticky to node
>> > requests that led us to the situation where the one node broke the 
>> SLA on
>> > response times.
>> >
>> > I ^ve been searching the web for awhile to find the LoadBalancer
>> > implementation that takes an account the GC activity and reduces the 
>> load
>> > accordingly case JVM is close to the major collection, but nothing 
>> found.
>> >
>> > Once again the LB of JVMs under the load is really an issue for 
>> production
>> > and with optimally distributed load you are able not only to lower the
>> > costs, but also able to prevent bad customer experience, not to mention
>> > broken SLAs.
>> >
>> > Feature request:
>> >
>> >         All lb algorithms have to be extended with the bidirectional
>> > connection with jvm:
>> >              Jvm -> Lb: old gen size and the current occupancy
>> >          Lb -> Jvm: prevent node overload and advice gc on dependent on
>> > parameterized free old gen space in %.
>> >
>> >
>> > All the ideas and comments are appreciated.
>> >
>> > Regards,
>> > Yefym.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message