tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henri Gomez" <henri.go...@gmail.com>
Subject Re: Feature request /Discussion: JK loadbalancer improvements for high load
Date Thu, 05 Jul 2007 07:21:24 GMT
Something we should also check is the CPU load of the Tomcat instance.
May be it will be usefull also to let users/admin add their own
counters in the load estimation.

For example, if some admins considers we should base the
load-balancing on HTTP requests or SQL access, and they have these
counters on their webapp applications, it will be usefull to be able
to get them from Tomcat to send them back to jk balancer.

It shouldn't be too hard and very welcome for many Tomcat sites

2007/7/4, Rainer Jung <rainer.jung@kippdata.de>:
> Hi,
>
> implementing a management communication between the lb and the backend
> is on the roadmap for jk3. It is somehow unlikely, that this will help
> in your situation, because when doing a GC the jvm will no longer
> respond to the management channel. Traditional Mark Sweep Compact GC is
> not distinguishable from the outside from a halt in the backend. Of
> course we could think of a webapp trying to use the JMX info on memory
> consumption to estimate GC activity in advance, but I doubt that this
> will be a stable solution. There are notifications, when GCs happen, but
> at the moment I'm not sure, if such events exist, before, or only after
> a GC.
>
> I think a first step (and a better solution) would be to use modern GC
> algorithms like Concurrent Mark Sweep, which will most of the time
> reduce the GC stop times to some 10s or 100s of milliseconds (depending
> on heap size). CMS comes with a cost, a little more memory needed and a
> little more CPU needed, but the dramatically decreased stop times are
> worth it. Also it is quite robust since about 1-2 years.
>
> Other components will not like long GC pauses as well, like for instance
> cluster replication. There you configure the longest pause you accept
> for missing heartbeat packets before assuming a node is dead. Assuming a
> node being dead because of GC pauses and then the node suddenly works
> without having noticed itself that it outer world has changed is a very
> bad situation too.
>
> What we plan as a first step for jk3 is putting mod_jk on the basis of
> the apache APR libraries. Then we can relatively easily use our own
> management threads to monitor the backend status and influence the
> balancing decisions. As long as we do everything on top of the request
> handling threads we can't do complex things in a stable way.
>
> Getting jk3 out of the door will take some longer time (maybe 6 to 12
> months'for a release). People willing to help are welcome.
>
> Concerning the SLAs: it always makes sense to put a percentage limit on
> the maximum response times and error rates. A 100% below some limit
> clause will always be to expensive. But of course, if you can't reduce
> GC times and the GC runs to often, there will be no acceptable
> percentage for long running requests.
>
> Thank you for sharing your experiences at Langen with us!
>
> Regards,
>
> Rainer
>
> Yefym Dmukh wrote:
> > Hi all,
> > sorry for the stress but it seems that it is a time to come back to  the
> > discussion related to the load balancing for JVM (Tomcat).
> >
> > Prehistory:
> > Recently we made benchmark and smoke tests of our product at the sun high
> > tech centre in Langen (Germany).
> >
> > As the webserver apache2.2.4 has been used, container -10xTomcat 5.5.25
> > and as load balancer - JK connector 1.2.23 with busyness algorithm.
> >
> >         Under the high load the strange behaviour was  observed: some
> > tomcat workers temporary got the non-proportional load, often 10 times
> > higher then the others for the relatively long periods.  As the result the
> > response times that usually stay under 500ms went up to 20+ sec, that in
> > its turn  made the overall test results almost two time worst as
> > estimated.
> >
> >                 At the beginning we were quite confused, because we were
> > sure that it was not the problem of JVM configuration and supposed that
> > the reason is in LB logic of mod_jk, and the both suggestions were right.
> >
> > Actually the following was happening: the LB sends requests and gets the
> > session sticky, continuously sending the upcoming requests to the same
> > cluster node. At the certain period of time the JVM started the major
> > garbage collection (full gc) and spent, mentioned above, 20 seconds. At
> > the same time jk continued to send new requests and the sticky to node
> > requests that led us to the situation where the one node broke the SLA on
> > response times.
> >
> > I ^ve been searching the web for awhile to find the LoadBalancer
> > implementation that takes an account the GC activity and reduces the load
> > accordingly case JVM is close to the major collection, but nothing found.
> >
> > Once again the LB of JVMs under the load is really an issue for production
> > and with optimally distributed load you are able not only to lower the
> > costs, but also able to prevent bad customer experience, not to mention
> > broken SLAs.
> >
> > Feature request:
> >
> >         All lb algorithms have to be extended with the bidirectional
> > connection with jvm:
> >              Jvm -> Lb: old gen size and the current occupancy
> >          Lb -> Jvm: prevent node overload and advice gc on dependent on
> > parameterized free old gen space in %.
> >
> >
> > All the ideas and comments are appreciated.
> >
> > Regards,
> > Yefym.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message