incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject how to debug slowdowns from these log snippets(we know the keys being accessed as well)
Date Wed, 20 Feb 2013 14:10:22 GMT
Cassandra version 1.1.4

I captured all the logs of node causing timeouts (in a 6 node cluster).  We seem to get these
slowdowns every once in a while and it causes our whole website to be 10 times slower.  Since
PlayOrm actually logs the rows being accessed we know exactly which row the timeout occurred
on and asytanax logged the node which was nice.

Each time we get this website being 10 times slower, this is in the logs

 WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line 145) Heap is 0.8532751700072093
full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to
the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java (line 2855) Flushing
CFS(Keyspace='databus5', ColumnFamily='IntegerIndice') to relieve memory pressure

I seem to remember reading about we should turn off swap which I have not gotten around to
at this point :(…will that help in that the node is taken out of the cluster or?  As you
can see from top, this node has been running a 5,000,000 hours below(What's up with that….maybe
the h does not actually mean hours) though I know it has been running for months.

Here is a top command….

top - 07:02:37 up 148 days, 21:11,  1 user,  load average: 1.10, 1.09, 1.09
Tasks: 379 total,   2 running, 377 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.9%us,  1.0%sy,  2.9%ni, 79.9%id,  0.1%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  32854680k total, 32582688k used,   271992k free,       44k buffers
Swap: 33554424k total,    52180k used, 33502244k free, 22653528k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  701 cassandr  20   0 63.3g  10g 1.9g S 838.4 32.2  5125794h java

What can I dump to see why we need to "relieve memory pressure"?

I ran jmap like so but it can't attach for some reason

[cassandra@a4 ~]$ jmap 701 -heap:format=b
Attaching to core -heap:format=b from executable 701, please wait...
Error attaching to core file: Can't attach to the core file

Jstack is not helping much either ….

[cassandra@a4 ~]$ jstack -l 701 > threads.txt
701: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
[cassandra@a4 ~]$ jstack -F -l 701 > threads.txt
Attaching to process ID 701, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.7-b02
java.lang.NullPointerException
at sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426)
at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
at sun.tools.jstack.JStack.main(JStack.java:84)

Thanks,
Dean

Mime
View raw message