jakarta-jcs-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Szmajda <j...@loki.ws>
Subject Re: JCS remote server
Date Fri, 11 Apr 2008 19:22:48 GMT
Got another one here:
http://loki.ws/~josh/restart-20080411.tar.bz2

server should work..

Niall Gallagher wrote:
> Hi Josh,
>
> I couldn't access your link, connection refused. I'll be out of the
> office until next wednesday so I hope you have some success by then.
>
> Kind regards,
> Niall
>
> On Thu, 2008-04-10 at 15:32 -0400, Joshua Szmajda wrote:
>
>   
>> Ok, caught one! The logs are pretty big, so I put them up here: 
>> http://loki.ws/~josh/restart-20080410.tar.bz2
>>
>> I'm really not sure what caused it, it seems to have happened a little 
>> more quickly than usual.
>>
>> Does it seem to be the GC? I can't tell, I can try adding in those GC 
>> tuning things but I don't want to jump the gun and change too many 
>> variables at once. I'll add in the tracing Al suggested though at least.
>>
>> Thanks!
>> -Josh
>>
>> Joshua Szmajda wrote:
>>     
>>> I'd been deleting the logs, so I don't have one right now ><. I did 
>>> change my scripts to save them though. As soon as it happens again 
>>> I'll have some data. It seems to take about a week or so of running 
>>> from a fresh start before I start to get problems.
>>>
>>> Niall: thanks for the explanation. I figured they were probably Byte 
>>> arrays, but then I saw the Strings and that threw me off :).
>>>
>>> Anyway as soon as I get some real data I'll post it to the list.
>>>
>>> Thanks all!
>>> -Josh
>>>
>>> Aaron Smuts wrote:
>>>       
>>>> Do you have any of the cache logs when this is
>>>> happening?
>>>>
>>>> I would turn the memory shrinker off (set the property
>>>> to false), as a start.  I generally don't run with the
>>>> memory shrinker on.  But I'm shooting in the dark.
>>>>
>>>> Aaron
>>>>
>>>>
>>>> --- Joshua Szmajda <josh@loki.ws> wrote:
>>>>
>>>>  
>>>>         
>>>>> Ahh yes of course, it was the user requirement. Now
>>>>> I have a nice bunch of data. This is interesting, but I'm not sure what
>>>>> the [B class is:
>>>>>
>>>>> num   #instances    #bytes  class name
>>>>> --------------------------------------
>>>>>   1:     31419   284852480  [B
>>>>>   2:      2277    19760264  [I
>>>>>   3:     57834     3865240  [C
>>>>>   4:     29628     1896192 org.apache.jcs.engine.ElementAttributes
>>>>>   5:     57838     1388112  java.lang.String
>>>>> ...
>>>>>
>>>>> Niall Gallagher wrote:
>>>>>    
>>>>>           
>>>>>> Hmm :D
>>>>>>
>>>>>> I just did a bit of digging. I've used this script
>>>>>>       
>>>>>>             
>>>>> on a few of our
>>>>>    
>>>>>           
>>>>>> servers in the past (32 and 64bit server VMs), but
>>>>>>       
>>>>>>             
>>>>> I just found a server
>>>>>    
>>>>>           
>>>>>> which gave me the exact same error message you
>>>>>>       
>>>>>>             
>>>>> got. That server it turns
>>>>>    
>>>>>           
>>>>>> out runs Java under a different user account to
>>>>>>       
>>>>>>             
>>>>> the one I was logged
>>>>>    
>>>>>           
>>>>>> into however.
>>>>>>
>>>>>> Try running the script from the exact same user
>>>>>>       
>>>>>>             
>>>>> account the JVM process
>>>>>    
>>>>>           
>>>>>> is running from. Even running from root doesn't
>>>>>>       
>>>>>>             
>>>>> work didn't work for me
>>>>>    
>>>>>           
>>>>>> on that server, it had to be exact same user
>>>>>>       
>>>>>>             
>>>>> account, which is
>>>>>    
>>>>>           
>>>>>> surprising.
>>>>>>
>>>>>> By the way those tools are documented here:
>>>>>>
>>>>>>       
>>>>>>             
>>>> http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jmap.html
>>>>  
>>>>         
>>>>>> and
>>>>>>
>>>>>>       
>>>>>>             
>>>> http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html
>>>>  
>>>>         
>>>>>> -basically they're supposed to work on most
>>>>>>       
>>>>>>             
>>>>> platforms except Windows and
>>>>>    
>>>>>           
>>>>>> Linux Itanium so unless you've got Itanium cpus it
>>>>>>       
>>>>>>             
>>>>> should work for you.
>>>>>    
>>>>>           
>>>>>> On Wed, 2008-04-09 at 14:44 -0400, Joshua Szmajda
>>>>>>       
>>>>>>             
>>>>> wrote:
>>>>>    
>>>>>           
>>>>>>        
>>>>>>             
>>>>>>> Hey Niall,
>>>>>>>
>>>>>>> Thanks for your script, but I'm getting these
>>>>>>>         
>>>>>>>               
>>>>> errors:
>>>>>    
>>>>>           
>>>>>>> ./capture-diagnostics.sh RemoteCacheServerFactory
>>>>>>> Capturing diagnostics for Java process
>>>>>>>         
>>>>>>>               
>>>>> "RemoteCacheServerFactory" (pid    
>>>>>           
>>>>>>> 2007)...
>>>>>>> 2007: Unable to open socket file: target process
>>>>>>>         
>>>>>>>               
>>>>> not responding or    
>>>>>           
>>>>>>> HotSpot VM not loaded
>>>>>>> The -F option can be used when the target process
>>>>>>>         
>>>>>>>               
>>>>> is not responding
>>>>>    
>>>>>           
>>>>>>> 2007: Unable to open socket file: target process
>>>>>>>         
>>>>>>>               
>>>>> not responding or    
>>>>>           
>>>>>>> HotSpot VM not loaded
>>>>>>> The -F option can be used when the target process
>>>>>>>         
>>>>>>>               
>>>>> is not responding
>>>>>    
>>>>>           
>>>>>>> Saved diagnostics for "RemoteCacheServerFactory"
>>>>>>>         
>>>>>>>               
>>>>> to    
>>>>>           
>>>>>>> "RemoteCacheServerFactory-diagnostics.txt"
>>>>>>>
>>>>>>> There must be something I'm missing when I'm
>>>>>>>         
>>>>>>>               
>>>>> running the cache server. I    
>>>>>           
>>>>>>> noticed it uses the 'server' VM by default, maybe
>>>>>>>         
>>>>>>>               
>>>>> these debug commands    
>>>>>           
>>>>>>> are only good for the client VM?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> -Josh
>>>>>>>
>>>>>>> Niall Gallagher wrote:
>>>>>>>            
>>>>>>>               
>>>>>>>> Hi Josh,
>>>>>>>>
>>>>>>>> Can you modify your cron job to capture
>>>>>>>>           
>>>>>>>>                 
>>>>> diagnostics before it restarts
>>>>>    
>>>>>           
>>>>>>>> the cache server?
>>>>>>>>
>>>>>>>> Then you can post the diagnostics next time it
>>>>>>>>           
>>>>>>>>                 
>>>>> happens. The script below
>>>>>    
>>>>>           
>>>>>>>> will capture diagnostics for you. We use
>>>>>>>>           
>>>>>>>>                 
>>>>> something like this in-house
>>>>>    
>>>>>           
>>>>>>>> for troubleshooting (not specifically for JCS).
>>>>>>>>
>>>>>>>> You'll first have to run the JDK 'jps' command
>>>>>>>>           
>>>>>>>>                 
>>>>> from either root, or the
>>>>>    
>>>>>           
>>>>>>>> user account which runs your cache server
>>>>>>>>           
>>>>>>>>                 
>>>>> instance. This gives you the
>>>>>    
>>>>>           
>>>>>>>> "name" of your cache server JVM process, which
>>>>>>>>           
>>>>>>>>                 
>>>>> you need to supply to the
>>>>>    
>>>>>           
>>>>>>>> diagnostics script as command-line parameter.
>>>>>>>>           
>>>>>>>>                 
>>>>> The script uses the name
>>>>>    
>>>>>           
>>>>>>>> to attach to the relevant JVM process.
>>>>>>>>
>>>>>>>> I don't know what might be causing the problem
>>>>>>>>           
>>>>>>>>                 
>>>>> for you. It could be a
>>>>>    
>>>>>           
>>>>>>>> bug in JCS, or it could be a memory issue. The
>>>>>>>>           
>>>>>>>>                 
>>>>> diagnostics will help
>>>>>    
>>>>>           
>>>>>>>> identify the problem.
>>>>>>>>
>>>>>>>> Save this as "capture-diagnostics.sh"...
>>>>>>>> -------
>>>>>>>> #!/bin/sh
>>>>>>>> # Saves the stack traces and class memory usage
>>>>>>>>           
>>>>>>>>                 
>>>>> information for a
>>>>>    
>>>>>           
>>>>>>>> # Java process running on the machine to a
>>>>>>>>           
>>>>>>>>                 
>>>>> diagnostics file.
>>>>>    
>>>>>           
>>>>>>>> #
>>>>>>>> # This script expects the name of the relevant
>>>>>>>>           
>>>>>>>>                 
>>>>> Java process to be
>>>>>    
>>>>>           
>>>>>>>> # specified as a parameter. The name specified
>>>>>>>>           
>>>>>>>>                 
>>>>> should match a Java
>>>>>    
>>>>>           
>>>>>>>> # process name as listed by running the JDK
>>>>>>>>           
>>>>>>>>                 
>>>>> 'jps' command.
>>>>>    
>>>>>           
>>>>>>>> #
>>>>>>>> # Usage: sh capture-diagnostics.sh <name of
>>>>>>>>           
>>>>>>>>                 
>>>>> process>
>>>>>    
>>>>>           
>>>>>>>> APP_NAME="$1"
>>>>>>>> JDK_LOCATION="/usr/java/default"
>>>>>>>> DUMP_FILE="$APP_NAME-diagnostics.txt"
>>>>>>>>
>>>>>>>> APP_PID="`$JDK_LOCATION/bin/jps|grep $APP_NAME
>>>>>>>>           
>>>>>>>>                 
>>>>> 2> /dev/null|cut -d\
>>>>>    
>>>>>           
>>>>>>>> -f1`"
>>>>>>>> if [ "$APP_PID" = "" ]; then
>>>>>>>> echo "ERROR: Can't determine pid of Java process
>>>>>>>>           
>>>>>>>>                 
>>>>> name specified
>>>>>    
>>>>>           
>>>>>>>> \"$APP_NAME\""
>>>>>>>> echo "Usage: sh capture-diagnostics.sh <name of
>>>>>>>>           
>>>>>>>>                 
>>>>> process as listed by jps
>>>>>    
>>>>>           
>>>>>>>> command>"
>>>>>>>> exit 20
>>>>>>>> fi
>>>>>>>> echo "Capturing diagnostics for Java process
>>>>>>>>           
>>>>>>>>                 
>>>>> \"$APP_NAME\" (pid
>>>>>    
>>>>>           
>>>>>>>> $APP_PID)..."
>>>>>>>> echo -e "Diagnostics for Java process
>>>>>>>>           
>>>>>>>>                 
>>>>> \"$APP_NAME\" (pid $APP_PID) as at
>>>>>    
>>>>>           
>>>>>>>> `date`:-" >> $DUMP_FILE
>>>>>>>> echo -e "\nTop 30 memory-consuming classes:-" >>
>>>>>>>>           
>>>>>>>>                 
>>>>> $DUMP_FILE
>>>>>    
>>>>>           
>>>>>>>> $JDK_LOCATION/bin/jmap -histo:live $APP_PID
>>>>>>>>           
>>>>>>>>                 
>>>>> |head -n33 >> $DUMP_FILE
>>>>>    
>>>>>           
>>>>>>>> echo -e "\nThread stack traces:-" >> $DUMP_FILE
>>>>>>>> $JDK_LOCATION/bin/jstack $APP_PID >> $DUMP_FILE
>>>>>>>> echo -e "\n" >> $DUMP_FILE
>>>>>>>> echo "Saved diagnostics for \"$APP_NAME\" to
>>>>>>>>           
>>>>>>>>                 
>>>>> \"$DUMP_FILE\""
>>>>>    
>>>>>           
>>>>>>>> -------
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 2008-04-09 at 10:11 -0400, Joshua
>>>>>>>>           
>>>>>>>>                 
>>>>> Szmajda wrote:
>>>>>    
>>>>>           
>>>>>>>>                  
>>>>>>>>                 
>>>>>>>>> Hey all,
>>>>>>>>>
>>>>>>>>> I've got a JCS remote cache server running on a
>>>>>>>>>             
>>>>>>>>>                   
>>>>> machine and every now    
>>>>>           
>>>>>>>>> and then it will spiral out of control and lock
>>>>>>>>>             
>>>>>>>>>                   
>>>>> the machine. I have no    
>>>>>           
>>>>>>>>> idea yet what's causing this, I've just put
>>>>>>>>>             
>>>>>>>>>                   
>>>>> some extra measures in place    
>>>>>           
>>>>>>>>> to capture the logs from when it happens. My
>>>>>>>>>             
>>>>>>>>>                   
>>>>> solution at this point is a    
>>>>>           
>>>>>>>>> cron job that checks now and then for excessive
>>>>>>>>>             
>>>>>>>>>                   
>>>>> cpu usage and restarts    
>>>>>           
>>>>>>>>> the cache server. I'd like to be able to not
>>>>>>>>>             
>>>>>>>>>                   
>>>>> worry about it, though :).
>>>>>    
>>>>>           
>>>>>>>>> Any suggestions?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> -Josh
>>>>>>>>>
>>>>>>>>> P.S. it's running on ubuntu-server (kernel
>>>>>>>>>             
>>>>>>>>>                   
>>>>> 2.6.22-14-server).
>>>>>    
>>>>>           
>>>>>>>>> I have up to 16 remote listeners connecting to
>>>>>>>>>             
>>>>>>>>>                   
>>>>> any given region.    
>>>>>           
>>>>>>>>> (probably 20 application instances in all).
>>>>>>>>> Puts grow at a rate of about 400 per second.
>>>>>>>>> I pass these options to java: "-Xms128m
>>>>>>>>>             
>>>>>>>>>                   
>>>>> -Xmx2000m"
>>>>>    
>>>>>           
>>>>>>>>> And here's my simple remote.cache.ccf:
>>>>>>>>>             
>>>>>>>>>                   
>>>> === message truncated ===
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: jcs-users-help@jakarta.apache.org
>>>>
>>>>   
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: jcs-users-help@jakarta.apache.org
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: jcs-users-help@jakarta.apache.org
>>     
>
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jcs-users-help@jakarta.apache.org


Mime
View raw message