tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Nelson <Steve.Nel...@ided.state.ia.us>
Subject RE: tomcat 5.0.16 Replication
Date Thu, 08 Jan 2004 18:45:14 GMT


I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you suggested and
I also see the jump in performance.

Something must have changed in ReplicationListener that causes this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.

-Steve

-----Original Message-----
From: jean-philippe.belanger@cgi.com
[mailto:jean-philippe.belanger@cgi.com]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in 
ReplicationListener.java

Here's what happening:

selector.select(timeout) -> return immediatly with one SelectorKey available
That key is not Acceptable and not Readable so it immediatly skip those 
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the 
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop 
and the CPU dropped back to 0% and the replication still worked nicely 
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the 
select(timeout) method shouldn't return a SelectorKey of that state. 
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

jean-philippe.belanger@cgi.com wrote:

> Hi Filip.
>
> I did some profiling of 40mins of tomcat with and without a 2nd node 
> up. here are the results with 
> -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:
>
> Those number are cpu=times and not samples since the later one freezes 
> on my systems.
> So that list shows the time spent in each methods.
>
> Major difference the some call to the sun.nio.ch.PollArrayWrapper 
> class. I don't know much about those NIOs packages but 819000 call in 
> 40 mins is a lot.
> The Socket Interface was called more than twice with 2 hosts than with 
> a single one. Which seams normal.
>
> Maybe this can help.
> If you need the complete hprof file I can send them to you.
>
> 1 host in cluster:
> CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
> rank   self  accum   count trace method
>   1 11.48% 11.48%      54    85 java.lang.Object.wait
>   2 11.46% 22.94%     117    86 java.lang.Object.wait
>   3 10.95% 33.89%    4115   215 java.net.PlainDatagramSocketImpl.receive
>   4 10.93% 44.81%    4114   224 java.lang.Thread.sleep
>   5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
>   6  7.37% 63.09%      28   495 java.lang.Object.wait
>   7  7.24% 70.34%      10   576 java.lang.Object.wait
>   8  4.57% 74.90%      90   716 java.lang.Thread.sleep
>   9  4.48% 79.38%       1   909 java.lang.Object.wait
>  10  4.48% 83.86%       1   908 java.lang.Object.wait
>  11  4.48% 88.34%      15   810 java.lang.Object.wait
>  12  4.47% 92.81%       1   910 java.net.PlainSocketImpl.socketAccept
>  13  0.71% 93.52%       2   623 java.lang.Object.wait
>  14  0.56% 94.08%       2   706 java.lang.Object.wait
>  15  0.38% 94.46%       2   914 java.lang.Object.wait
>  16  0.24% 94.70%     775   913 java.lang.String.toCharArray
>  17  0.23% 94.93%       3   475 java.lang.Thread.sleep
>  18  0.16% 95.09%       2   472 java.lang.Object.wait
>  19  0.15% 95.24%       2   595 java.lang.Thread.sleep
>  20  0.15% 95.40%       2   586 java.lang.Thread.sleep
>  21  0.15% 95.55%       2   703 java.lang.Thread.sleep
>  22  0.15% 95.70%       2   476 java.lang.Thread.sleep
>  23  0.15% 95.85%       2   692 java.lang.Thread.sleep
>  24  0.12% 95.97%  218595   385 java.lang.CharacterDataLatin1.toLowerCase
>  25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
>  26  0.11% 96.20%  218595   433 
> java.lang.CharacterDataLatin1.getProperties
>  27  0.10% 96.30%  210925   389 java.lang.String.equalsIgnoreCase
>  28  0.08% 96.38%  157259   387 java.lang.String.charAt
>  29  0.08% 96.46%       1   646 java.lang.Thread.sleep
>  30  0.08% 96.53%       1   634 java.lang.Thread.sleep
>  31  0.08% 96.61%       1   903 java.lang.Thread.sleep
>  32  0.08% 96.69%       1   714 java.lang.Thread.sleep
>  33  0.08% 96.76%       1   811 java.lang.Thread.sleep
>  34  0.08% 96.84%       1   715 java.lang.Thread.sleep
>
> 2 hosts:
> CPU TIME (ms) BEGIN (total = 37247) Thu Jan  8 11:01:28 2004
> rank   self  accum   count trace method
>   1  9.56%  9.56%      52    85 java.lang.Object.wait
>   2  9.56% 19.12%      29    86 java.lang.Object.wait
>   3  9.30% 28.43%       3   267 java.lang.Object.wait
>   4  9.25% 37.68%    6644   224 java.lang.Thread.sleep
>   5  9.23% 46.91%   13116   215 java.net.PlainDatagramSocketImpl.receive
>   6  7.67% 54.58%       3   266 java.lang.Object.wait
>   7  5.90% 60.47%      39   847 java.lang.Object.wait
>   8  5.76% 66.24%      12   503 java.lang.Object.wait
>   9  3.90% 70.14%     145   975 java.lang.Thread.sleep
>  10  3.90% 74.04%       1  1174 java.lang.Object.wait
>  11  3.90% 77.94%       1  1173 java.lang.Object.wait
>  12  3.90% 81.84%      25   973 java.lang.Object.wait
>  13  3.90% 85.74%       1  1175 java.net.PlainSocketImpl.socketAccept
>  14  3.88% 89.62%  819692   214 sun.nio.ch.PollArrayWrapper.poll0
>  15  0.75% 90.37%       2   958 java.lang.Object.wait
>  16  0.28% 90.65%       2   457 java.lang.Object.wait
>  17  0.26% 90.91%       2  1181 java.lang.Object.wait
>
> Filip Hanik wrote:
>
>> I'll try to get an instance going today. Will let you know how it goes
>> also, try asynchronous replication, does it still go to 100%?
>>
>> Filip
>>
>> -----Original Message-----
>> From: Steve Nelson [mailto:Steve.Nelson@ided.state.ia.us]
>> Sent: Wednesday, January 07, 2004 12:08 PM
>> To: 'Tomcat Users List'
>> Subject: RE: tomcat 5.0.16 Replication
>>
>>
>>
>>
>> Okay, did that got this
>>
>> BEGIN TO RECEIVE
>> SENT:Default 1
>> RECEIVED:Default 1 FROM /10.0.0.110:5555
>> SENT:Default 2
>> BEGIN TO RECEIVE
>> RECEIVED:Default 2 FROM /10.0.0.110:5555
>> SENT:Default 3
>> BEGIN TO RECEIVE
>> RECEIVED:Default 3 FROM /10.0.0.110:5555
>> SENT:Default 4
>> BEGIN TO RECEIVE
>> RECEIVED:Default 4 FROM /10.0.0.110:5555
>>
>> *shrug*
>>
>> BTW It didn't go to 100% CPU ute before I started using the code from 
>> CVS.
>> Of course the Manager would almost always timeout before it would 
>> recieve
>> the message.
>>
>> Now it gets the message right away, but maxes my machine out.
>>
>>
>>
>>
>> -----Original Message-----
>> From: Filip Hanik [mailto:devlists@hanik.com]
>> Sent: Wednesday, January 07, 2004 1:58 PM
>> To: Tomcat Users List
>> Subject: RE: tomcat 5.0.16 Replication
>>
>>
>> 100% cpu can mean that you have a multicast problem, try to run
>>
>> java -cp tomcat-replication.jar MCaster
>>
>> download the jar from http://cvs.apache.org/~fhanik/
>>
>> Filip
>>
>> -----Original Message-----
>> From: Steve Nelson [mailto:Steve.Nelson@ided.state.ia.us]
>> Sent: Wednesday, January 07, 2004 6:51 AM
>> To: 'tomcat-user@jakarta.apache.org'
>> Subject: tomcat 5.0.16 Replication
>>
>>
>>
>> I was having random problems with clustering when starting up. Mostly 
>> it had
>> to do with Timing out
>> when the manager was starting up. I built the CVS version and it 
>> solved that
>> problem. But it has caused
>> some serious performance problems.
>>
>> First a little background.
>>
>> I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, 
>> Tomcat
>> 5.0.16 (with catalina-cluster.jar build from cvs) The multicast 
>> packets are
>> restricted to a crossover link between the servers. There are 3 hosts 
>> in the
>> server.xml, all with clustering set up. They all function just fine.
>>
>> But.....the cpu's spikes up to 100% if I start up both servers. I 
>> know this
>> didn't happen without the new catalina-cluster.jar. If I shut down 1 
>> server
>> (doesn't matter which) everything returns to normal. But when both are
>> running both servers are at 100% CPU. I am trying to profile it now, 
>> but I
>> figured if someone has already experienced this they could save me some
>> time.
>>
>> Oh, and there isn't anything relevant in my logs. It's not throwing 
>> millions
>> of errors or something.
>>
>> -Steve Nelson
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>
>>
>>  
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>


-- 
Jean-Philippe Bélanger
(514)228-8800 ext 3060
111 Duke
CGI


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message