jakarta-jcs-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niall Gallagher <ni...@switchfire.com>
Subject Re: JCS remote server
Date Fri, 11 Apr 2008 17:31:51 GMT
Hi Josh,

I couldn't access your link, connection refused. I'll be out of the
office until next wednesday so I hope you have some success by then.

Kind regards,
Niall

On Thu, 2008-04-10 at 15:32 -0400, Joshua Szmajda wrote:

> Ok, caught one! The logs are pretty big, so I put them up here: 
> http://loki.ws/~josh/restart-20080410.tar.bz2
> 
> I'm really not sure what caused it, it seems to have happened a little 
> more quickly than usual.
> 
> Does it seem to be the GC? I can't tell, I can try adding in those GC 
> tuning things but I don't want to jump the gun and change too many 
> variables at once. I'll add in the tracing Al suggested though at least.
> 
> Thanks!
> -Josh
> 
> Joshua Szmajda wrote:
> > I'd been deleting the logs, so I don't have one right now ><. I did 
> > change my scripts to save them though. As soon as it happens again 
> > I'll have some data. It seems to take about a week or so of running 
> > from a fresh start before I start to get problems.
> >
> > Niall: thanks for the explanation. I figured they were probably Byte 
> > arrays, but then I saw the Strings and that threw me off :).
> >
> > Anyway as soon as I get some real data I'll post it to the list.
> >
> > Thanks all!
> > -Josh
> >
> > Aaron Smuts wrote:
> >> Do you have any of the cache logs when this is
> >> happening?
> >>
> >> I would turn the memory shrinker off (set the property
> >> to false), as a start.  I generally don't run with the
> >> memory shrinker on.  But I'm shooting in the dark.
> >>
> >> Aaron
> >>
> >>
> >> --- Joshua Szmajda <josh@loki.ws> wrote:
> >>
> >>  
> >>> Ahh yes of course, it was the user requirement. Now
> >>> I have a nice bunch of data. This is interesting, but I'm not sure what
> >>> the [B class is:
> >>>
> >>> num   #instances    #bytes  class name
> >>> --------------------------------------
> >>>   1:     31419   284852480  [B
> >>>   2:      2277    19760264  [I
> >>>   3:     57834     3865240  [C
> >>>   4:     29628     1896192 org.apache.jcs.engine.ElementAttributes
> >>>   5:     57838     1388112  java.lang.String
> >>> ...
> >>>
> >>> Niall Gallagher wrote:
> >>>    
> >>>> Hmm :D
> >>>>
> >>>> I just did a bit of digging. I've used this script
> >>>>       
> >>> on a few of our
> >>>    
> >>>> servers in the past (32 and 64bit server VMs), but
> >>>>       
> >>> I just found a server
> >>>    
> >>>> which gave me the exact same error message you
> >>>>       
> >>> got. That server it turns
> >>>    
> >>>> out runs Java under a different user account to
> >>>>       
> >>> the one I was logged
> >>>    
> >>>> into however.
> >>>>
> >>>> Try running the script from the exact same user
> >>>>       
> >>> account the JVM process
> >>>    
> >>>> is running from. Even running from root doesn't
> >>>>       
> >>> work didn't work for me
> >>>    
> >>>> on that server, it had to be exact same user
> >>>>       
> >>> account, which is
> >>>    
> >>>> surprising.
> >>>>
> >>>> By the way those tools are documented here:
> >>>>
> >>>>       
> >> http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jmap.html
> >>  
> >>>> and
> >>>>
> >>>>       
> >> http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html
> >>  
> >>>> -basically they're supposed to work on most
> >>>>       
> >>> platforms except Windows and
> >>>    
> >>>> Linux Itanium so unless you've got Itanium cpus it
> >>>>       
> >>> should work for you.
> >>>    
> >>>> On Wed, 2008-04-09 at 14:44 -0400, Joshua Szmajda
> >>>>       
> >>> wrote:
> >>>    
> >>>>        
> >>>>> Hey Niall,
> >>>>>
> >>>>> Thanks for your script, but I'm getting these
> >>>>>         
> >>> errors:
> >>>    
> >>>>> ./capture-diagnostics.sh RemoteCacheServerFactory
> >>>>> Capturing diagnostics for Java process
> >>>>>         
> >>> "RemoteCacheServerFactory" (pid    
> >>>>> 2007)...
> >>>>> 2007: Unable to open socket file: target process
> >>>>>         
> >>> not responding or    
> >>>>> HotSpot VM not loaded
> >>>>> The -F option can be used when the target process
> >>>>>         
> >>> is not responding
> >>>    
> >>>>> 2007: Unable to open socket file: target process
> >>>>>         
> >>> not responding or    
> >>>>> HotSpot VM not loaded
> >>>>> The -F option can be used when the target process
> >>>>>         
> >>> is not responding
> >>>    
> >>>>> Saved diagnostics for "RemoteCacheServerFactory"
> >>>>>         
> >>> to    
> >>>>> "RemoteCacheServerFactory-diagnostics.txt"
> >>>>>
> >>>>> There must be something I'm missing when I'm
> >>>>>         
> >>> running the cache server. I    
> >>>>> noticed it uses the 'server' VM by default, maybe
> >>>>>         
> >>> these debug commands    
> >>>>> are only good for the client VM?
> >>>>>
> >>>>> Thanks!
> >>>>> -Josh
> >>>>>
> >>>>> Niall Gallagher wrote:
> >>>>>            
> >>>>>> Hi Josh,
> >>>>>>
> >>>>>> Can you modify your cron job to capture
> >>>>>>           
> >>> diagnostics before it restarts
> >>>    
> >>>>>> the cache server?
> >>>>>>
> >>>>>> Then you can post the diagnostics next time it
> >>>>>>           
> >>> happens. The script below
> >>>    
> >>>>>> will capture diagnostics for you. We use
> >>>>>>           
> >>> something like this in-house
> >>>    
> >>>>>> for troubleshooting (not specifically for JCS).
> >>>>>>
> >>>>>> You'll first have to run the JDK 'jps' command
> >>>>>>           
> >>> from either root, or the
> >>>    
> >>>>>> user account which runs your cache server
> >>>>>>           
> >>> instance. This gives you the
> >>>    
> >>>>>> "name" of your cache server JVM process, which
> >>>>>>           
> >>> you need to supply to the
> >>>    
> >>>>>> diagnostics script as command-line parameter.
> >>>>>>           
> >>> The script uses the name
> >>>    
> >>>>>> to attach to the relevant JVM process.
> >>>>>>
> >>>>>> I don't know what might be causing the problem
> >>>>>>           
> >>> for you. It could be a
> >>>    
> >>>>>> bug in JCS, or it could be a memory issue. The
> >>>>>>           
> >>> diagnostics will help
> >>>    
> >>>>>> identify the problem.
> >>>>>>
> >>>>>> Save this as "capture-diagnostics.sh"...
> >>>>>> -------
> >>>>>> #!/bin/sh
> >>>>>> # Saves the stack traces and class memory usage
> >>>>>>           
> >>> information for a
> >>>    
> >>>>>> # Java process running on the machine to a
> >>>>>>           
> >>> diagnostics file.
> >>>    
> >>>>>> #
> >>>>>> # This script expects the name of the relevant
> >>>>>>           
> >>> Java process to be
> >>>    
> >>>>>> # specified as a parameter. The name specified
> >>>>>>           
> >>> should match a Java
> >>>    
> >>>>>> # process name as listed by running the JDK
> >>>>>>           
> >>> 'jps' command.
> >>>    
> >>>>>> #
> >>>>>> # Usage: sh capture-diagnostics.sh <name of
> >>>>>>           
> >>> process>
> >>>    
> >>>>>> APP_NAME="$1"
> >>>>>> JDK_LOCATION="/usr/java/default"
> >>>>>> DUMP_FILE="$APP_NAME-diagnostics.txt"
> >>>>>>
> >>>>>> APP_PID="`$JDK_LOCATION/bin/jps|grep $APP_NAME
> >>>>>>           
> >>> 2> /dev/null|cut -d\
> >>>    
> >>>>>> -f1`"
> >>>>>> if [ "$APP_PID" = "" ]; then
> >>>>>> echo "ERROR: Can't determine pid of Java process
> >>>>>>           
> >>> name specified
> >>>    
> >>>>>> \"$APP_NAME\""
> >>>>>> echo "Usage: sh capture-diagnostics.sh <name of
> >>>>>>           
> >>> process as listed by jps
> >>>    
> >>>>>> command>"
> >>>>>> exit 20
> >>>>>> fi
> >>>>>> echo "Capturing diagnostics for Java process
> >>>>>>           
> >>> \"$APP_NAME\" (pid
> >>>    
> >>>>>> $APP_PID)..."
> >>>>>> echo -e "Diagnostics for Java process
> >>>>>>           
> >>> \"$APP_NAME\" (pid $APP_PID) as at
> >>>    
> >>>>>> `date`:-" >> $DUMP_FILE
> >>>>>> echo -e "\nTop 30 memory-consuming classes:-" >>
> >>>>>>           
> >>> $DUMP_FILE
> >>>    
> >>>>>> $JDK_LOCATION/bin/jmap -histo:live $APP_PID
> >>>>>>           
> >>> |head -n33 >> $DUMP_FILE
> >>>    
> >>>>>> echo -e "\nThread stack traces:-" >> $DUMP_FILE
> >>>>>> $JDK_LOCATION/bin/jstack $APP_PID >> $DUMP_FILE
> >>>>>> echo -e "\n" >> $DUMP_FILE
> >>>>>> echo "Saved diagnostics for \"$APP_NAME\" to
> >>>>>>           
> >>> \"$DUMP_FILE\""
> >>>    
> >>>>>> -------
> >>>>>>
> >>>>>>
> >>>>>> On Wed, 2008-04-09 at 10:11 -0400, Joshua
> >>>>>>           
> >>> Szmajda wrote:
> >>>    
> >>>>>>                  
> >>>>>>> Hey all,
> >>>>>>>
> >>>>>>> I've got a JCS remote cache server running on a
> >>>>>>>             
> >>> machine and every now    
> >>>>>>> and then it will spiral out of control and lock
> >>>>>>>             
> >>> the machine. I have no    
> >>>>>>> idea yet what's causing this, I've just put
> >>>>>>>             
> >>> some extra measures in place    
> >>>>>>> to capture the logs from when it happens. My
> >>>>>>>             
> >>> solution at this point is a    
> >>>>>>> cron job that checks now and then for excessive
> >>>>>>>             
> >>> cpu usage and restarts    
> >>>>>>> the cache server. I'd like to be able to not
> >>>>>>>             
> >>> worry about it, though :).
> >>>    
> >>>>>>> Any suggestions?
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> -Josh
> >>>>>>>
> >>>>>>> P.S. it's running on ubuntu-server (kernel
> >>>>>>>             
> >>> 2.6.22-14-server).
> >>>    
> >>>>>>> I have up to 16 remote listeners connecting to
> >>>>>>>             
> >>> any given region.    
> >>>>>>> (probably 20 application instances in all).
> >>>>>>> Puts grow at a rate of about 400 per second.
> >>>>>>> I pass these options to java: "-Xms128m
> >>>>>>>             
> >>> -Xmx2000m"
> >>>    
> >>>>>>> And here's my simple remote.cache.ccf:
> >>>>>>>             
> >> === message truncated ===
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
> >> For additional commands, e-mail: jcs-users-help@jakarta.apache.org
> >>
> >>   
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: jcs-users-help@jakarta.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jcs-users-help@jakarta.apache.org



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message