Return-Path: Delivered-To: apmail-jakarta-tomcat-dev-archive@apache.org Received: (qmail 58934 invoked from network); 3 May 2002 07:47:30 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 3 May 2002 07:47:30 -0000 Received: (qmail 7644 invoked by uid 97); 3 May 2002 07:47:31 -0000 Delivered-To: qmlist-jakarta-archive-tomcat-dev@nagoya.betaversion.org Received: (qmail 7572 invoked by alias); 3 May 2002 07:47:31 -0000 Delivered-To: jakarta-archive-tomcat-dev@jakarta.apache.org Received: (qmail 7559 invoked by uid 97); 3 May 2002 07:47:30 -0000 Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Tomcat Developers List" Reply-To: "Tomcat Developers List" Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 7547 invoked by uid 98); 3 May 2002 07:47:30 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Message-ID: <3CD2408F.3080903@schlund.de> Date: Fri, 03 May 2002 09:47:27 +0200 From: Bernd Koecke Organization: Schlund+Partner User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc1) Gecko/20020417 X-Accept-Language: de, en MIME-Version: 1.0 To: Tomcat Developers List Subject: Re: PROPOSAL: mod_jk2: Group/Instance References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi Costin, May be I checked out the wrong repository. I checked out jakarta-tomcat-connectors with the CVSROOT=:pserver:anoncvs@cvs.apache.org:/home/cvspublic Now to the details, see below. costinm@covalent.net wrote: > On Thu, 2 May 2002, Bernd Koecke wrote: > > >>misunderstood it. After you said that my patch is included a had a closer look >>at mod_jk. I can't see anything of my code but I found the special meaning of >>the zero lb_factor/lb_value. It seems that I didn't understand it right at the >>first time. This could solve my problem but after a closer look and some testing >>I found another problem. When you set the lb_value in workers.properties to 1 >>for the local tomcat and 0 for the others, you get the desired behavior. But if >>you switch off the local tomcat for a short time you come into trouble. The >>problem is the 0 for the other workers. The calculation of lb_worker transforms >>the 0 to _inf_. Because 1/0 for a double is _inf_. This is greater than any > > > I think there is a piece that checks for 0 and sets it to DEFAULT_VALUE > (==1 ) before doing 1/lb. No, I think not :). I checked it yesterday. With some additional log statements in the validate function of jk_lb_worker.c you get the value _inf_ for the lb_factor and lb_value (line 434-444). Because if it would be set to 1, my config hadn't worked. Because I set the local worker to 1 and the others to 0. > > While looking at the code - I'm not very sure this whole float is needed, > I'll try to find a way to simplify it and use ints ( maybe 0..100 with > some 'special' values for NEVER and ALLWAYS, or some additional flags ). > This is possible, but then you must add a check if the value is 0. Because without it you calc 1/0 with an int and this will give you an error. > But the way it works ( or at least how I understand it ) is that if the > main worker fails, then we look at all workers in error state and try the > one with the oldest error. And the 'main' worker will be tried again when > the timeout expires. > Thats not the whole story. Its right you will check the main worker when its back again and use it only once. Because when the request was successful handled rec->in_recovering is true (line 332 of jk_lb_worker.c, service function). Than get_max_lb get the value _inf_ from one of the other worker. Than the things happen which I said in my prior mail. > I haven't tested this too much, I just applied the patches ( that I > understand :-), I'll add some more debugging for this process and maybe > we can find a better solution. > > But this functionality is essential for the JNI worker and very important > in general - so I really want to find the best solution. If you have any > patch idea, let me know. > > To avoid further confusion and complexity in the lb-factor/value, I > think we should add one more flag ( 'local_worker' ? ) and use it > explicitely. Again, patches are wellcome - it's allways good to have > different ( and more ) eyes looking at the code. > That was it what I did in my sent patch, the additional documentation was sent a few days later. But my additions to the lb_worker were a little bit to complex. You are right we should get it when we use the flag only on the main worker and change the behavior after a failure for this worker. But we need the trick with 0/inf for the other worker, because only with this we have the situation that the other worker wouldn't be asked when there is no session and the main worker is up. I will try to build another patch and send it. I think it could be possible without an additional flag. Another tought about this: When you use double and we fix the handling after an error, the main worker would never reach _inf_. Because the lb_factor is < 1 if lb_value wasn't 0. After choosing the worker this value is added to the lb_value. But with a high value for lb_value the differenc between two savable double numbers is greater than the lb_factor. But this is only interessting in theory. I think in real world we will reboot apache before this will happen :). Bernd > ( that can go in both jk1, but I can't see a release of jk2 without this > functionality ) > > Costin > > > >>other lb_value and greater than the lb_value of the local tomcat. But after a >>failure of the local tomcat he is in error_state. After some time its set to >>recovering and if the local tomcat is back again the function jk(2)_get_max_lb >>gets the highest lb_value. This is _inf_ from one of the other workers. The >>addition of a value to _inf_ is meaningless. You end up with an lb_value of >>_inf_ for the local worker. If this worker isn't the first in the worker list, >>it will never be choosen again. Because his lb_value will never be less than >>another lb_value, because all the other workers have _inf_ as theire lb_values. >>So every request without a session will be routed to the first of the other >>tomcats. >> >>The only way out is a restart of the local apache after tomcat is up and >>running. But I don't know when tomcat is finished with all his contexts and >>started the connectors. >> >>I didn't looked very deep into jk2, but I found the same >>get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb function >>will always return _inf_. In your answer to some other mails you said, that >>workers could be removed. Do I understand it right, that if my local tomcat goes >>down his worker is removed from the list and after he is comming up again added >>to the worker list with reseted lb_value (only for mod_jk2)? >> >>The next days I will look in the docu and code of jk2 and give it a try. May be >>all my problems gone away with the new module :). >> >>Sorry if I ask stupid questions, but I want to make it working for our new cluster. >> >>Thanks >> >>Bernd >> >> >>>This is essential for jk2's JNI worker, which fits perfectly this case >>>( you don't want to send via TCP when you have a tomcat instance in the >>>same process ). >>> >>> >>> >>> >>>>(2) Tomcat instances in standby or "soft shutdown" mode where they serve >>>>requests bound by established sessions, and requests without a session only >>>>if all non-standby instances have failed. >>> >>> >>>That's what the SHM scoreboard is going to do ( among other things ). >>>You can register tomcat instances ( which will be added automatically ), >>>or unregister - in which case no new requests ( except the old sessions ) >>>will go to the unregistered tomcat. >>> >>> >>>Costin >>> >>> >>> >>>>costinm@covalent.net wrote: >>>> >>>> >>>> >>>>>On Tue, 30 Apr 2002, Bernd Koecke wrote: >>>>> >>>>> >>>>> >>>>>>some weeks ago I send a patch for mod_jk for an only routing lb_worker. A >>>>> >>>>few >>>> >>>> >>>>>>days later I sent the docu. Henry Gomez said, that it should be commited. >>>>> >>>>But it >>>> >>>> >>>>>>I think it isn't in the repository. But its the same with me here, to >>>>> >>>>mutch >>>> >>>> >>>>>>work for to less time :). >>>>> >>>>>I think it is in mod_jk, I remember seeing the commit. >>>>> >>>>>And I think I commited it in jk2 as well ( after some modifications ). >>>>> >>>>> >>>>> >>>>>>I need sticky sessions but no loadbalancing in the module. If a request >>>>> >>>>without >>>> >>>> >>>>>>a session comes in, it should be routed to the _local_ tomcat. >>>>> >>>>>Well, there is another use-case with the exact same behavior - Apache2 >>>>>with tomcat in JNI mode. All requests without session should be routed to >>>>>the _jni_ channel ( i.e. in-process, minimal overhead ). >>>>> >>>>>It's exacly the same - so be sure I do my best to handle this case :-) >>>>> >>>>>Apache2 acts like a 'natural' load-balancer/fail-over, with the parent >>>>>process monitoring for crashes and it starts/stop childs based on >>>>>load. >>>>> >>>>> >>>>> >>>>> >>>>>>I think this could be possible with the associated instance of a channel >>>>> >>>>(item >>>> >>>> >>>>>>7). Then I have to configure all four nodes for the same group. Because >>>>> >>>>all >>>> >>>> >>>>>>nodes will serve the same webapps and associate the channel with this >>>>> >>>>group. But >>>> >>>> >>>>>>for this I need a non balancing group. I don't see if the default >>>>> >>>>behavior of a >>>> >>>> >>>>>>group is balancing and if this can be switched off. Is this right or do I >>>>> >>>>miss >>>> >>>> >>>>>>something? >>>>> >>>>>The default is balancing, but you can tune this using weithgs ( and I >>>>>think we use your code for making one instance 'top priority'). >>>>> >>>>>Please check the code, take a look and send additional comments/patches. >>>>> >>>>>It's not yet completely done, of course. >>>>> >>>>> >>>>>Thanks, >>>>>Costin >>>> >>>> >>>>-- >>>>To unsubscribe, e-mail: >>>>For additional commands, e-mail: >>>> >>>>-- >>>>To unsubscribe, e-mail: >>>>For additional commands, e-mail: >>>> >>>> >>> >>>-- >>>To unsubscribe, e-mail: >>>For additional commands, e-mail: >>> >> >> >> > > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > -- Dipl.-Inform. Bernd Koecke UNIX-Entwicklung Schlund+Partner AG Fon: +49-721-91374-0 E-Mail: bk@schlund.de -- To unsubscribe, e-mail: For additional commands, e-mail: