incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: how to debug slowdowns from these log snippets-more info 2
Date Wed, 20 Feb 2013 15:35:51 GMT
Oh, and my startup command that cassandra logged was

a2.bigde.nrel.gov: xss =  -ea -javaagent:/opt/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8021M -Xmx8021M
-Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k

And I remember from docs you don't want to go above 8G or java GC doesn't
work out so well.  I am not sure why this is not working out though.

Dean

On 2/20/13 7:16 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>Here is the printout before that log which is probably important as
>wellŠ..
>
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 GCInspector.java (line
>122) GC for ConcurrentMarkSweep: 3618 ms for 2 collections, 7038159096
>used; max is 8243904512
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>57) Pool Name                    Active   Pending   Blocked
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>72) ReadStage                        11       264         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) RequestResponseStage              0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) ReadRepairStage                   0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) MutationStage                    12        88         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) ReplicateOnWriteStage             0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) GossipStage                       1         7         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>72) AntiEntropyStage                  0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) MigrationStage                    0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) StreamStage                       0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) MemtablePostFlusher               0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) FlushWriter                       0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) MiscStage                         0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>72) commitlog_archiver                0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>72) InternalResponseStage             0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>72) HintedHandoff                     0         0         0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>77) CompactionManager                 4         5
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>89) MessagingService                n/a    10,127
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>99) Cache Type                     Size                 Capacity
>    KeysToSave    
>Provider
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>100) KeyCache                    1310719                  1310719
>            all
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>106) RowCache                          0                        0
>            all   
>org.apache.cassandra.cache.SerializingCacheProvider
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>113) ColumnFamily                Memtable ops,data
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.ServersThatLog                   0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.DecimalIndice             971,885311
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.EntityGroupXref                  0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.IntegerIndice            3256,620947
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.KeyToTableName                   0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.SdiTable                         0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.DboTableMeta               339,41537
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>116) databus5.nreldata                4528,3261763
> INFO [GossipTasks:1] 2013-02-20 07:14:00,375 Gossiper.java (line 830)
>InetAddress /10.10.43.5 is now dead.
> INFO [GossipTasks:1] 2013-02-20 07:14:00,380 Gossiper.java (line 830)
>InetAddress /10.10.43.8 is now dead.
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>116) databus5.DboColumnMeta             809,112672
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>116) databus5.DboDatabaseMeta                  0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.WebNodeDbo                 2310,4470
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.User                             0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.RoleMapping                      0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.LogEvent                         0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>116) databus5.SecureResourceGroupXref                 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.SecurityGroup                    0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.StreamAggregation                 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.AppProperty                      0,0
> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>InetAddress /10.10.43.3 is now dead.
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.StringIndice               290,27231
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.MonitorDbo                   75,2983
> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>InetAddress /10.10.43.7 is now dead.
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>116) databus5.SdiColumn                        0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.NodeIdInfo                         0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.IndexInfo                          0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.LocationInfo                       0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.Versions                           0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.schema_keyspaces                   0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.Migrations                         0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.schema_columnfamilies                 0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>116) system.schema_columns                     0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>116) system.HintsColumnFamily                  0,0
> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>116) system.Schema                             0,0
> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 GCInspector.java (line
>145) Heap is 0.8537409774403752 full.  You may need to reduce memtable
>and/or cache sizes.  Cassandra will now flush up to the two largest
>memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>in cassandra.yaml if you don't want Cassandra to do this automatically
> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 StorageService.java (line
>2855) Flushing CFS(Keyspace='databus5', ColumnFamily='nreldata') to
>relieve memory pressure
>
>
>
>On 2/20/13 7:10 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:
>
>>Cassandra version 1.1.4
>>
>>I captured all the logs of node causing timeouts (in a 6 node cluster).
>>We seem to get these slowdowns every once in a while and it causes our
>>whole website to be 10 times slower.  Since PlayOrm actually logs the
>>rows being accessed we know exactly which row the timeout occurred on and
>>asytanax logged the node which was nice.
>>
>>Each time we get this website being 10 times slower, this is in the logs
>>
>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line
>>145) Heap is 0.8532751700072093 full.  You may need to reduce memtable
>>and/or cache sizes.  Cassandra will now flush up to the two largest
>>memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>>in cassandra.yaml if you don't want Cassandra to do this automatically
>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java
>>(line 2855) Flushing CFS(Keyspace='databus5',
>>ColumnFamily='IntegerIndice') to relieve memory pressure
>>
>>I seem to remember reading about we should turn off swap which I have not
>>gotten around to at this point :(Šwill that help in that the node is
>>taken out of the cluster or?  As you can see from top, this node has been
>>running a 5,000,000 hours below(What's up with thatŠ.maybe the h does not
>>actually mean hours) though I know it has been running for months.
>>
>>Here is a top commandŠ.
>>
>>top - 07:02:37 up 148 days, 21:11,  1 user,  load average: 1.10, 1.09,
>>1.09
>>Tasks: 379 total,   2 running, 377 sleeping,   0 stopped,   0 zombie
>>Cpu(s): 15.9%us,  1.0%sy,  2.9%ni, 79.9%id,  0.1%wa,  0.0%hi,  0.2%si,
>>0.0%st
>>Mem:  32854680k total, 32582688k used,   271992k free,       44k buffers
>>Swap: 33554424k total,    52180k used, 33502244k free, 22653528k cached
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  701 cassandr  20   0 63.3g  10g 1.9g S 838.4 32.2  5125794h java
>>
>>What can I dump to see why we need to "relieve memory pressure"?
>>
>>I ran jmap like so but it can't attach for some reason
>>
>>[cassandra@a4 ~]$ jmap 701 -heap:format=b
>>Attaching to core -heap:format=b from executable 701, please wait...
>>Error attaching to core file: Can't attach to the core file
>>
>>Jstack is not helping much either Š.
>>
>>[cassandra@a4 ~]$ jstack -l 701 > threads.txt
>>701: Unable to open socket file: target process not responding or HotSpot
>>VM not loaded
>>The -F option can be used when the target process is not responding
>>[cassandra@a4 ~]$ jstack -F -l 701 > threads.txt
>>Attaching to process ID 701, please wait...
>>Debugger attached successfully.
>>Server compiler detected.
>>JVM version is 20.7-b02
>>java.lang.NullPointerException
>>at 
>>sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:42
>>6
>>)
>>at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
>>at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
>>at 
>>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
>>at 
>>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>>at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>>at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>>at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>at 
>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
>>:
>>39)
>>at 
>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
>>m
>>pl.java:25)
>>at java.lang.reflect.Method.invoke(Method.java:597)
>>at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
>>at sun.tools.jstack.JStack.main(JStack.java:84)
>>
>>Thanks,
>>Dean
>


Mime
View raw message