Mailing-List: contact dev-help@oodt.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@oodt.apache.org
Received-SPF: pass (nike.apache.org: domain of holenoter@me.com designates
 17.148.16.91 as permitted sender)
MIME-version: 1.0
Content-type: multipart/alternative;
 boundary="Boundary_(ID_KoAlj7803aO6Bu4BquMBkw)"
To: "Tkatcheva, Irina N (388D)" <irina.n.tkatcheva@jpl.nasa.gov>
Cc: dev@oodt.apache.org
From: holenoter <holenoter@me.com>
Subject: Re: RE: Question about xmlrpc
Date: Tue, 21 Feb 2012 23:11:39 +0000 (GMT)
Message-id: <848cf01d-2db4-c0ef-8b5d-a67ef4f20a6d@me.com>
In-reply-to: 
 <63E24D720DCE7246ABAA722B52D23CDF015D2C517597@ALTPHYEMBEVSP10.RES.AD.JPL>

--Boundary_(ID_KoAlj7803aO6Bu4BquMBkw)
Content-type: text/plain; charset=utf-8; format=flowed
Content-transfer-encoding: quoted-printable

hey irina, =0A=0Atry increasing the number of retries to something like 10=
0 or 200 and see if you get the same problems, basically with your setup i=
t will only retry for 10 minutes... if you launch a bunch of jobs, especia=
lly with a lot of conditions, they are going to keep the workflow manager =
overloaded for a while... if this doesn't fix the problem it seems like ma=
ybe there may be a synchronization bug in wengine=0A=0A-brian=EF=BB=BF=0A=0A=
On Feb 21, 2012, at 02:26 PM, "Tkatcheva, Irina N (388D)" <irina.n.tkatche=
va@jpl.nasa.gov> wrote:=0A=0A> Hi Brian,=0A>=0A> We have=0A> <property nam=
e=3D"connectionRetries" value=3D"20"/>=0A> <property name=3D"connectionRet=
ryIntervalSecs" value=3D"30"/>=0A>=0A> Irina=0A> _________________________=
_______________=0A> From: holenoter [holenoter@me.com]=0A> Sent: Tuesday, =
February 21, 2012 2:21 PM=0A> To: dev@oodt.apache.org=0A> Cc: Tkatcheva, I=
rina N (388D)=0A> Subject: Re: Question about xmlrpc=0A>=0A> hey irina,=0A=
>=0A> how many retries do you have set for each task and how long do is yo=
ur interval between retries?=0A>=0A> -brian=0A>=0A> On Feb 21, 2012, at 09=
:56 AM, "Tkatcheva, Irina N (388D)" <irina.n.tkatcheva@jpl.nasa.gov> wrote=
:=0A>=0A> Hi Brian and all,=0A>=0A> I have noticed that the system does re=
cover after the "System overload: Maximum number of concurren trequests (1=
00) exceeded" message, but usually some jobs stay in 'Waiting on resource =
(executing)' condition and never proceed further. I have seen it every tim=
e after the overload messages. I usually run a test that runs a bunch of j=
obs overnight. If there is no overload messages, all jobs are completed; i=
f there are overload messages, usually in the morning some jobs are stuck =
in 'Waiting on resource (executing)' state. So it looks to me that the sys=
tem does not recover completely.=0A>=0A> Irina=0A>=0A>=0A>=0A> On Feb 17, =
2012, at 9:17 AM, Brian Foster wrote:=0A>=0A> Hey Chris,=0A>=0A> ya I'm in=
 favor of adding the property but let's make it use 100 by default if the =
property is not set and I would even say let's add it to the properties fi=
le but comment it out or something.. that's a really advanced flag which o=
nly needs to be changed to get rid of that logging message... CAS works fi=
ne even when that message is being thrown... I think it prints to sndout, =
otherwise I would have just turned the logging for that off back when I ad=
ded the client retry handlers that fixed the issue... oh and this is anoth=
er thing your probably gonna want to port to trunk workflow :)=0A>=0A> -Br=
ian=0A>=0A> "Mattmann, Chris A (388J)" <chris.a.mattmann@jpl.nasa.gov<mail=
to:chris.a.mattmann@jpl.nasa.gov><mailto:chris.a.mattmann@jpl.nasa.gov<mai=
lto:chris.a.mattmann@jpl.nasa.gov>>> wrote:=0A>=0A> Thanks Brian, I was th=
inking this too, +1, which is why I cautioned against any number greater t=
han 256=0A> in terms of thread count in my reply email too, since the risk=
 is either that (a) you have to increase the=0A> ulimit (which extends the=
 boundaries from devops oriented updates to sysops on the sysadmin side);=0A=
> and (b) the JVM will likely start trashing unless there is an inordinate=
 amount of RAM, or swap space, etc.=0A>=0A> I think the best solution here=
 is to simply make it a configurable property and then encourage projects=0A=
> to use a sensible default that's not too large...=0A>=0A> Cheers,=0A> Ch=
ris=0A>=0A> On Feb 16, 2012, at 12:52 AM, Brian Foster wrote:=0A>=0A> You =
have to be careful with the number you set that too because you are basica=
lly telling XML-RPC that it is now allowed to create 2000 threads in the s=
ame JVM... not a good practice... I don't remember the exact number but th=
e JVM will crash if it creates a certain number of threads because there i=
s a limit to the number of threads one process can create and I believe th=
is is restricted at the operating system level... and i believe this numbe=
r is less than 2000... The trunk filemgr and wengine already have built-in=
 client retry handling support and are configurable via java properties (i=
e. org.apache.oodt.cas.filemgr.system.xmlrpc.connection.retries and o.a.o=
c.filemger.system.connection.retry.interval.seconds and there are similar=
 ones for wengine)... The message you are seeing is XML-RPC server logging=
 that it already using a 100 worker threads... you will see this message i=
f you create a 100+ jobs in the RM (e.g. Workflow Conditions and Tasks) an=
d they all start talking to the workflow manager or file manger at the sam=
e time... the client retry handlers will catch this error and just wait an=
d retry again... you shouldn't be loosing any data... the only inconvenien=
ce I guess is that message is cluttering the logs=0A>=0A> -Brian=0A>=0A> O=
n Feb 15, 2012, at 10:42 PM, "Cheng, Cecilia S (388K)" <cecilia.s.cheng@jp=
l.nasa.gov<mailto:cecilia.s.cheng@jpl.nasa.gov><mailto:cecilia.s.cheng@jpl=
nasa.gov<mailto:cecilia.s.cheng@jpl.nasa.gov>>> wrote:=0A>=0A>=0A> Hi Chr=
is,=0A>=0A> Sure we can discuss this in dev@oodt.apache.org<mailto:dev@ood=
t.apache.org><mailto:dev@oodt.apache.org<mailto:dev@oodt.apache.org>>.=0A>=
=0A> If you feel comfortable w/ the 2000 number, of course I can push the =
patch=0A> upstream into Apache OODT. But what kind of tests, if any, shoul=
d we do=0A> before we deliver the patch? Our projects are concerned that i=
f we=0A> arbitrarily set a number, we don't know what other problems it mi=
ght cause.=0A>=0A> Thanks,=0A> Cecilia=0A>=0A> On 2/15/12 10:07 PM, "Mattm=
ann, Chris A (388J)"=0A> <chris.a.mattmann@jpl.nasa.gov<mailto:chrisa.matt=
mann@jpl.nasa.gov><mailto:chris.a.mattmann@jpl.nasa.gov<mailto:chris.a.mat=
tmann@jpl.nasa.gov>>> wrote:=0A>=0A> Hi Cecilia,=0A>=0A> This is really go=
od news!=0A>=0A> A couple questions:=0A>=0A> 1. Do you think you would be =
willing to push your XML-RPC patches upstream=0A> into Apache OODT so othe=
rs in the=0A> community could benefit? This would involve filing correspon=
ding JIRA issue(s)=0A> [1], and then letting the dev@oodt.apache.org<mailt=
o:dev@oodt.apache.org><mailto:dev@oodtapache.org<mailto:dev@oodt.apache.or=
g>>=0A> know.=0A>=0A> 2. Can we move this conversation onto dev@oodt.apach=
e.org<mailto:dev@oodt.apache.org><mailto:dev@oodt.apache.org<mailto:dev@oo=
dt.apache.org>>? I think others=0A> could benefit from the answers below.=0A=
>=0A> Thanks and let me know. If you'd like to discuss more, that's fine t=
oo, but=0A> I'd urge us to move this onto the public Apache OODT=0A> lists=
=0A>=0A> Cheers,=0A> Chris=0A>=0A> [1] http://issues.apache.org/jira/brow=
se/OODT=0A>=0A> On Feb 15, 2012, at 2:31 PM, Cheng, Cecilia S (388K) wrote=
:=0A>=0A> Hi Chris and Paul,=0A>=0A> Just want to fill you in on where we =
are w/ the xmlrpc problem that we see on=0A> ACOS and PEATE and get your a=
dvice.=0A>=0A> As you might recall, on both projects, and in all 3 compone=
nts (FM, RM, and=0A> WEngine), we will periodically see the following mess=
age in the console:=0A>=0A> java.lang.RuntimeException: System overload: M=
aximum number of concurrent=0A> requests (100) exceeded=0A>=0A> when the s=
ystem is very busy. Since upgrading to the newer version of xmlrpc=0A> see=
ms to be quite involved, we thought that we will just download the source=0A=
> code and change the hardcoded number of 100 to something bigger, recompi=
le=0A> the jar file and use that in our system.=0A>=0A> So I set the numbe=
r to 2000 and have Lan, Michael and Irina try again. All 3=0A> of them sai=
d that it solved their problems, but now that this works, we have=0A> othe=
r concerns:=0A>=0A> [1] Will setting this number so high (2000 vs. 100) cr=
eate other problems?=0A> [2] How can we find out what is a =E2=80=9Cgood=E2=
=80=9D number to use?=0A> [3] What are some ways I can monitor these concu=
rrent requests as they run?=0A> netstat?=0A>=0A> Would you please share yo=
ur thought on this?=0A>=0A> Thanks,=0A> Cecilia=0A>=0A>=0A>=0A> ++++++++++=
++++++++++++++++++++++++++++++++++++++++++++++++++++++++=0A> Chris Mattman=
n, Ph.D.=0A> Senior Computer Scientist=0A> NASA Jet Propulsion Laboratory =
Pasadena, CA 91109 USA=0A> Office: 171-266B, Mailstop: 171-246=0A> Email: =
chris.a.mattmann@nasa.gov<mailto:chris.a.mattmann@nasa.gov><mailto:chrisa.=
mattmann@nasa.gov<mailto:chrisa.mattmann@nasa.gov>>=0A> WWW: http://sunset=
usc.edu/~mattmann/=0A> Phone: +1 (818) 354-8810=0A> +++++++++++++++++++++=
+++++++++++++++++++++++++++++++++++++++++++++=0A> Adjunct Assistant Profes=
sor, Computer Science Department=0A> University of Southern California, Lo=
s Angeles, CA 90089 USA=0A> ++++++++++++++++++++++++++++++++++++++++++++++=
++++++++++++++++++++=0A>=0A>=0A>=0A>=0A> +++++++++++++++++++++++++++++++++=
+++++++++++++++++++++++++++++++++=0A> Chris Mattmann, Ph.D.=0A> Senior Com=
puter Scientist=0A> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA=0A=
> Office: 171-266B, Mailstop: 171-246=0A> Email: chris.a.mattmann@nasa.gov=
<mailto:chris.a.mattmann@nasa.gov><mailto:chris.a.mattmann@nasa.gov<mailto=
:chris.a.mattmann@nasa.gov>>=0A> WWW: http://sunset.usc.edu/~mattmann/=0A>=
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=0A> Ad=
junct Assistant Professor, Computer Science Department=0A> University of S=
outhern California, Los Angeles, CA 90089 USA=0A> ++++++++++++++++++++++++=
++++++++++++++++++++++++++++++++++++++++++=0A>=0A>=0A=

--Boundary_(ID_KoAlj7803aO6Bu4BquMBkw)
Content-type: multipart/related;
 boundary="Boundary_(ID_vhRcnUzVrwbABX5jcoHpEg)"; type="text/html"


--Boundary_(ID_vhRcnUzVrwbABX5jcoHpEg)
Content-type: text/html; charset=windows-1252
Content-transfer-encoding: quoted-printable

<html><body><div><div>hey irina,&nbsp;</div><div><br></div><div>try increasing the num=
ber of =0Aretries to something like 100 or 200 and see if you get the same=
 =0Aproblems, basically with your setup it will only retry for 10 minutes.=
.=0A if you launch a bunch of jobs, especially with a lot of conditions, =
=0Athey are going to keep the workflow manager overloaded for a while... i=
f=0A this doesn't fix the problem it seems like maybe there may be a =0Asy=
nchronization bug in wengine</div><div><br></div><div>-brian</div></div><d=
iv><br>On Feb 21, 2012, at 02:26 PM, "Tkatcheva, Irina N (388D)" &lt;irina=
n.tkatcheva@jpl.nasa.gov&gt; wrote:<br><br><div><blockquote type=3D"cite"=
><div class=3D"msg-quote"><div class=3D"_stretch">Hi Brian,<br>=0A<br>=0AW=
e have<br>=0A&lt;property name=3D"connectionRetries" value=3D"20"/&gt;<br>=
=0A&lt;property name=3D"connectionRetryIntervalSecs" value=3D"30"/&gt;<br>=
=0A<br>=0AIrina<br>=0A________________________________________<br>=0AFrom:=
 holenoter [<a href=3D"mailto:holenoter@me.com" _mce_href=3D"mailto:holeno=
ter@me.com">holenoter@me.com</a>]<br>=0ASent: Tuesday, February 21, 2012 2=
:21 PM<br>=0ATo: <a href=3D"mailto:dev@oodt.apache.org" _mce_href=3D"mailt=
o:dev@oodt.apache.org">dev@oodt.apache.org</a><br>=0ACc: Tkatcheva, Irina =
N (388D)<br>=0ASubject: Re: Question about xmlrpc<br>=0A<br>=0Ahey irina,<=
br>=0A<br>=0Ahow many retries do you have set for each task and how long d=
o is your interval between retries?<br>=0A<br>=0A-brian<br>=0A<br>=0AOn Fe=
b 21, 2012, at 09:56 AM, "Tkatcheva, Irina N (388D)" &lt;<a href=3D"mailto=
:irina.n.tkatcheva@jpl.nasa.gov" _mce_href=3D"mailto:irina.n.tkatcheva@jpl=
nasa.gov">irina.n.tkatcheva@jpl.nasa.gov</a>&gt; wrote:<br>=0A<br>=0AHi B=
rian and all,<br>=0A<br>=0AI have noticed that the system does recover aft=
er the "System overload: Maximum number of concurren trequests (100) excee=
ded" message, but usually some jobs stay in 'Waiting on resource (executin=
g)' condition and never proceed further. I have seen it every time after t=
he overload messages. I usually run a test that runs a bunch of jobs overn=
ight. If there is no overload messages, all jobs are completed; if there a=
re overload messages, usually in the morning some jobs are stuck in 'Waiti=
ng on resource (executing)' state. So it looks to me that the system does =
not recover completely.<br>=0A<br>=0AIrina<br>=0A<br>=0A<br>=0A<br>=0AOn F=
eb 17, 2012, at 9:17 AM, Brian Foster wrote:<br>=0A<br>=0AHey Chris,<br>=0A=
<br>=0Aya I'm in favor of adding the property but let's make it use 100 by=
 default if the property is not set and I would even say let's add it to t=
he properties file but comment it out or something.. that's a really advan=
ced flag which only needs to be changed to get rid of that logging message=
.. CAS works fine even when that message is being thrown... I think it pr=
ints to sndout, otherwise I would have just turned the logging for that of=
f back when I added the client retry handlers that fixed the issue... oh a=
nd this is another thing your probably gonna want to port to trunk workflo=
w :)<br>=0A<br>=0A-Brian<br>=0A<br>=0A"Mattmann, Chris A (388J)" &lt;<a hr=
ef=3D"mailto:chris.a.mattmann@jpl.nasa.gov" _mce_href=3D"mailto:chris.a.ma=
ttmann@jpl.nasa.gov">chris.a.mattmann@jpl.nasa.gov</a>&lt;mailto:<a href=3D=
"mailto:chris.a.mattmann@jpl.nasa.gov" _mce_href=3D"mailto:chris.a.mattman=
n@jpl.nasa.gov">chris.a.mattmann@jpl.nasa.gov</a>&gt;&lt;mailto:<a href=3D=
"mailto:chris.a.mattmann@jpl.nasa.gov" _mce_href=3D"mailto:chris.a.mattman=
n@jpl.nasa.gov">chris.a.mattmann@jpl.nasa.gov</a>&lt;mailto:<a href=3D"mai=
lto:chris.a.mattmann@jpl.nasa.gov" _mce_href=3D"mailto:chris.a.mattmann@jp=
l.nasa.gov">chris.a.mattmann@jpl.nasa.gov</a>&gt;&gt;&gt; wrote:<br>=0A<br=
>=0AThanks Brian, I was thinking this too, +1, which is why I cautioned ag=
ainst any number greater than 256<br>=0Ain terms of thread count in my rep=
ly email too, since the risk is either that (a) you have to increase the<b=
r>=0Aulimit (which extends the boundaries from devops oriented updates to =
sysops on the sysadmin side);<br>=0Aand (b) the JVM will likely start tras=
hing unless there is an inordinate amount of RAM, or swap space, etc.<br>=0A=
<br>=0AI think the best solution here is to simply make it a configurable =
property and then encourage projects<br>=0Ato use a sensible default that'=
s not too large...<br>=0A<br>=0ACheers,<br>=0AChris<br>=0A<br>=0AOn Feb 16=
, 2012, at 12:52 AM, Brian Foster wrote:<br>=0A<br>=0AYou have to be caref=
ul with the number you set that too because you are basically telling XML-=
RPC that it is now allowed to create 2000 threads in the same JVM... not a=
 good practice... I don't remember the exact number but the JVM will crash=
 if it creates a certain number of threads because there is a limit to the=
 number of threads one process can create and I believe this is restricted=
 at the operating system level... and i believe this number is less than 2=
000... The trunk filemgr and wengine already have built-in client retry ha=
ndling support and are configurable via java properties (i.e. org.apache.o=
odt.cas.filemgr.system.xmlrpc.connection.retries and o.a.o.c.filemger.syst=
em.connection.retry.interval.seconds and there are similar ones for wengin=
e)... The message you are seeing is XML-RPC server logging that it already=
 using a 100 worker threads... you will see this message if you create a 1=
00+ jobs in the RM (e.g. Workflow Conditions and Tasks) and they all start=
 talking to the workflow manager or file manger at the same time... the cl=
ient retry handlers will catch this error and just wait and retry again...=
 you shouldn't be loosing any data... the only inconvenience I guess is th=
at message is cluttering the logs<br>=0A<br>=0A-Brian<br>=0A<br>=0AOn Feb =
15, 2012, at 10:42 PM, "Cheng, Cecilia S (388K)" &lt;<a href=3D"mailto:cec=
ilia.s.cheng@jpl.nasa.gov" _mce_href=3D"mailto:cecilia.s.cheng@jpl.nasa.go=
v">cecilia.s.cheng@jpl.nasa.gov</a>&lt;mailto:<a href=3D"mailto:cecilia.s.=
cheng@jpl.nasa.gov" _mce_href=3D"mailto:cecilia.s.cheng@jpl.nasa.gov">ceci=
lia.s.cheng@jpl.nasa.gov</a>&gt;&lt;mailto:<a href=3D"mailto:cecilia.s.che=
ng@jpl.nasa.gov" _mce_href=3D"mailto:cecilia.s.cheng@jpl.nasa.gov">cecilia=
s.cheng@jpl.nasa.gov</a>&lt;mailto:<a href=3D"mailto:cecilia.s.cheng@jpl.=
nasa.gov" _mce_href=3D"mailto:cecilia.s.cheng@jpl.nasa.gov">cecilia.s.chen=
g@jpl.nasa.gov</a>&gt;&gt;&gt; wrote:<br>=0A<br>=0A<br>=0AHi Chris,<br>=0A=
<br>=0ASure we can discuss this in <a href=3D"mailto:dev@oodt.apache.org" =
_mce_href=3D"mailto:dev@oodt.apache.org">dev@oodt.apache.org</a>&lt;mailto=
:<a href=3D"mailto:dev@oodt.apache.org" _mce_href=3D"mailto:dev@oodt.apach=
e.org">dev@oodt.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto:dev@oodt.ap=
ache.org" _mce_href=3D"mailto:dev@oodt.apache.org">dev@oodt.apache.org</a>=
&lt;mailto:<a href=3D"mailto:dev@oodt.apache.org" _mce_href=3D"mailto:dev@=
oodt.apache.org">dev@oodt.apache.org</a>&gt;&gt;.<br>=0A<br>=0AIf you feel=
 comfortable w/ the 2000 number, of course I can push the patch<br>=0Aupst=
ream into Apache OODT. But what kind of tests, if any, should we do<br>=0A=
before we deliver the patch? Our projects are concerned that if we<br>=0Aa=
rbitrarily set a number, we don't know what other problems it might cause.=
<br>=0A<br>=0AThanks,<br>=0ACecilia<br>=0A<br>=0AOn 2/15/12 10:07 PM, "Mat=
tmann, Chris A (388J)"<br>=0A&lt;<a href=3D"mailto:chris.a.mattmann@jpl.na=
sa.gov" _mce_href=3D"mailto:chris.a.mattmann@jpl.nasa.gov">chris.a.mattman=
n@jpl.nasa.gov</a>&lt;mailto:<a href=3D"mailto:chrisa.mattmann@jpl.nasa.go=
v" _mce_href=3D"mailto:chrisa.mattmann@jpl.nasa.gov">chrisa.mattmann@jpl.n=
asa.gov</a>&gt;&lt;mailto:<a href=3D"mailto:chris.a.mattmann@jpl.nasa.gov"=
 _mce_href=3D"mailto:chris.a.mattmann@jpl.nasa.gov">chris.a.mattmann@jpl.n=
asa.gov</a>&lt;mailto:<a href=3D"mailto:chris.a.mattmann@jpl.nasa.gov" _mc=
e_href=3D"mailto:chris.a.mattmann@jpl.nasa.gov">chris.a.mattmann@jpl.nasa.=
gov</a>&gt;&gt;&gt; wrote:<br>=0A<br>=0AHi Cecilia,<br>=0A<br>=0AThis is r=
eally good news!<br>=0A<br>=0AA couple questions:<br>=0A<br>=0A1. Do you t=
hink you would be willing to push your XML-RPC patches upstream<br>=0Ainto=
 Apache OODT so others in the<br>=0Acommunity could benefit? This would in=
volve filing corresponding JIRA issue(s)<br>=0A[1], and then letting the <=
a href=3D"mailto:dev@oodt.apache.org" _mce_href=3D"mailto:dev@oodt.apache.=
org">dev@oodt.apache.org</a>&lt;mailto:<a href=3D"mailto:dev@oodt.apache.o=
rg" _mce_href=3D"mailto:dev@oodt.apache.org">dev@oodt.apache.org</a>&gt;&l=
t;mailto:<a href=3D"mailto:dev@oodtapache.org" _mce_href=3D"mailto:dev@ood=
tapache.org">dev@oodtapache.org</a>&lt;mailto:<a href=3D"mailto:dev@oodt.a=
pache.org" _mce_href=3D"mailto:dev@oodt.apache.org">dev@oodt.apache.org</a=
>&gt;&gt;<br>=0Aknow.<br>=0A<br>=0A2. Can we move this conversation onto <=
a href=3D"mailto:dev@oodt.apache.org" _mce_href=3D"mailto:dev@oodt.apache.=
org">dev@oodt.apache.org</a>&lt;mailto:<a href=3D"mailto:dev@oodt.apache.o=
rg" _mce_href=3D"mailto:dev@oodt.apache.org">dev@oodt.apache.org</a>&gt;&l=
t;mailto:<a href=3D"mailto:dev@oodt.apache.org" _mce_href=3D"mailto:dev@oo=
dt.apache.org">dev@oodt.apache.org</a>&lt;mailto:<a href=3D"mailto:dev@ood=
t.apache.org" _mce_href=3D"mailto:dev@oodt.apache.org">dev@oodt.apache.org=
</a>&gt;&gt;? I think others<br>=0Acould benefit from the answers below.<b=
r>=0A<br>=0AThanks and let me know. If you'd like to discuss more, that's =
fine too, but<br>=0AI'd urge us to move this onto the public Apache OODT<b=
r>=0Alists.<br>=0A<br>=0ACheers,<br>=0AChris<br>=0A<br>=0A[1] <a href=3D"h=
ttp://issues.apache.org/jira/browse/OODT" _mce_href=3D"http://issues.apach=
e.org/jira/browse/OODT">http://issues.apache.org/jira/browse/OODT</a><br>=0A=
<br>=0AOn Feb 15, 2012, at 2:31 PM, Cheng, Cecilia S (388K) wrote:<br>=0A<=
br>=0AHi Chris and Paul,<br>=0A<br>=0AJust want to fill you in on where we=
 are w/ the xmlrpc problem that we see on<br>=0AACOS and PEATE and get you=
r advice.<br>=0A<br>=0AAs you might recall, on both projects, and in all 3=
 components (FM, RM, and<br>=0AWEngine), we will periodically see the foll=
owing message in the console:<br>=0A<br>=0Ajava.lang.RuntimeException: Sys=
tem overload: Maximum number of concurrent<br>=0Arequests (100) exceeded<b=
r>=0A<br>=0Awhen the system is very busy. Since upgrading to the newer ver=
sion of xmlrpc<br>=0Aseems to be quite involved, we thought that we will j=
ust download the source<br>=0Acode and change the hardcoded number of 100 =
to something bigger, recompile<br>=0Athe jar file and use that in our syst=
em.<br>=0A<br>=0ASo I set the number to 2000 and have Lan, Michael and Iri=
na try again. All 3<br>=0Aof them said that it solved their problems, but =
now that this works, we have<br>=0Aother concerns:<br>=0A<br>=0A[1] Will s=
etting this number so high (2000 vs. 100) create other problems?<br>=0A[2]=
 How can we find out what is a =93good=94 number to use?<br>=0A[3] What ar=
e some ways I can monitor these concurrent requests as they run?<br>=0Anet=
stat?<br>=0A<br>=0AWould you please share your thought on this?<br>=0A<br>=
=0AThanks,<br>=0ACecilia<br>=0A<br>=0A<br>=0A<br>=0A++++++++++++++++++++++=
++++++++++++++++++++++++++++++++++++++++++++<br>=0AChris Mattmann, Ph.D.<b=
r>=0ASenior Computer Scientist<br>=0ANASA Jet Propulsion Laboratory Pasade=
na, CA 91109 USA<br>=0AOffice: 171-266B, Mailstop: 171-246<br>=0AEmail: <a=
 href=3D"mailto:chris.a.mattmann@nasa.gov" _mce_href=3D"mailto:chris.a.mat=
tmann@nasa.gov">chris.a.mattmann@nasa.gov</a>&lt;mailto:<a href=3D"mailto:=
chris.a.mattmann@nasa.gov" _mce_href=3D"mailto:chris.a.mattmann@nasa.gov">=
chris.a.mattmann@nasa.gov</a>&gt;&lt;mailto:<a href=3D"mailto:chrisa.mattm=
ann@nasa.gov" _mce_href=3D"mailto:chrisa.mattmann@nasa.gov">chrisa.mattman=
n@nasa.gov</a>&lt;mailto:<a href=3D"mailto:chrisa.mattmann@nasa.gov" _mce_=
href=3D"mailto:chrisa.mattmann@nasa.gov">chrisa.mattmann@nasa.gov</a>&gt;&=
gt;<br>=0AWWW: <a href=3D"http://sunset.usc.edu/%7Emattmann/" _mce_href=3D=
"http://sunset.usc.edu/~mattmann/">http://sunset.usc.edu/~mattmann/</a><br=
>=0APhone: +1 (818) 354-8810<br>=0A+++++++++++++++++++++++++++++++++++++++=
+++++++++++++++++++++++++++<br>=0AAdjunct Assistant Professor, Computer Sc=
ience Department<br>=0AUniversity of Southern California, Los Angeles, CA =
90089 USA<br>=0A++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=
++++++++<br>=0A<br>=0A<br>=0A<br>=0A<br>=0A+++++++++++++++++++++++++++++++=
+++++++++++++++++++++++++++++++++++<br>=0AChris Mattmann, Ph.D.<br>=0ASeni=
or Computer Scientist<br>=0ANASA Jet Propulsion Laboratory Pasadena, CA 91=
109 USA<br>=0AOffice: 171-266B, Mailstop: 171-246<br>=0AEmail: <a href=3D"=
mailto:chris.a.mattmann@nasa.gov" _mce_href=3D"mailto:chris.a.mattmann@nas=
a.gov">chris.a.mattmann@nasa.gov</a>&lt;mailto:<a href=3D"mailto:chris.a.m=
attmann@nasa.gov" _mce_href=3D"mailto:chris.a.mattmann@nasa.gov">chris.a.m=
attmann@nasa.gov</a>&gt;&lt;mailto:<a href=3D"mailto:chris.a.mattmann@nasa=
gov" _mce_href=3D"mailto:chris.a.mattmann@nasa.gov">chris.a.mattmann@nasa=
gov</a>&lt;mailto:<a href=3D"mailto:chris.a.mattmann@nasa.gov" _mce_href=3D=
"mailto:chris.a.mattmann@nasa.gov">chris.a.mattmann@nasa.gov</a>&gt;&gt;<b=
r>=0AWWW: <a href=3D"http://sunset.usc.edu/%7Emattmann/" _mce_href=3D"http=
://sunset.usc.edu/~mattmann/">http://sunset.usc.edu/~mattmann/</a><br>=0A+=
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<br>=0AAd=
junct Assistant Professor, Computer Science Department<br>=0AUniversity of=
 Southern California, Los Angeles, CA 90089 USA<br>=0A++++++++++++++++++++=
++++++++++++++++++++++++++++++++++++++++++++++<br>=0A<br>=0A<br>=0A</div><=
/div></blockquote></div></div></body></html>=

--Boundary_(ID_vhRcnUzVrwbABX5jcoHpEg)--

--Boundary_(ID_KoAlj7803aO6Bu4BquMBkw)--