cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: GC Exceptions and cluster nodes are dying
Date Wed, 08 Dec 2010 15:24:12 GMT
Cut your memtable thresholds (throughput and ops) in half.  See "describe
keyspaces" and "update keyspace" in the cli.

On Wed, Dec 8, 2010 at 9:18 AM, Amin Sakka, Novapost <amin.sakka@novapost.fr
> wrote:

> Thanks for your answer Aaron,
>
> I'm now on the RC1, I have no longer the ActiveCount error, however my
> nodes still dying under bulk insertion.
>
> I have modified my nodes configuration (all of them has now 2GB Heap size).
> The nodes still under heavy pressure and they dies after a random timeout
> (sometimes after 10 minutes of insertion
> and sometimes after 50 minutes).
> I want to point that I'm inserting rows in 4 different columns families at
> the same time and that the rows size is too little (few KiloBytes).
> I've attached here my cassandra.yaml configuration file.
> Can you help me please to solve this issue?
>
> Thanks!
>
> Here is some of my log output:
>
> DEBUG [MutationStage:27] 2010-12-08 15:31:19,214
> RowMutationVerbHandler.java (line 78) RowMutation(keyspace='SAE',
> key='6163636f756e743936353a726566393230',
> modifications=[ColumnFamily(Document
> [6465736372697074696f6e:false:49@1291818633293000
> ,646f63756d656e744964:false:36@1291818633293000
> ,7265666572656e6365:false:17@1291818633293000,])]) applied.  Sending
> response to 714546@/10.0.100.94
> DEBUG [MutationStage:7] 2010-12-08 15:31:19,214 RowMutationVerbHandler.java
> (line 54) Applying RowMutation(keyspace='SAE',
> key='6163636f756e743938373a726566313835',
> modifications=[ColumnFamily(Document
> [6465736372697074696f6e:false:49@1291818633524000
> ,646f63756d656e744964:false:36@1291818633524000
> ,7265666572656e6365:false:17@1291818633524000,])])
> DEBUG [MutationStage:7] 2010-12-08 15:31:19,214 Table.java (line 378)
> applying mutation of row 6163636f756e743938373a726566313835
> DEBUG [MutationStage:7] 2010-12-08 15:31:19,214 RowMutationVerbHandler.java
> (line 78) RowMutation(keyspace='SAE',
> key='6163636f756e743938373a726566313835',
> modifications=[ColumnFamily(Document
> [6465736372697074696f6e:false:49@1291818633524000
> ,646f63756d656e744964:false:36@1291818633524000
> ,7265666572656e6365:false:17@1291818633524000,])]) applied.  Sending
> response to 714547@/10.0.100.94
> *DEBUG [ScheduledTasks:1] 2010-12-08 15:31:19,373 GCInspector.java (line
> 135) GC for ParNew: 13 ms, 34456296 reclaimed leaving 1465917704 used; max
> is 2256404480*
>
>
>
>
>
> 2010/12/1 Aaron Morton <aaron@thelastpickle.com>
>
> Running nodes with different JVM heap sizes would not be recommended
>> practice, for many reasons. Nor would I recommend running them with all the
>> memory the machine has, it will just lead to the OS swapping the JVM out to
>> disk and considerable slow things down.
>>
>> I would suggest a heap size of 1.5 or 2.0 GB for each node, and have a
>> read of the JVM Heap Size section here
>> http://wiki.apache.org/cassandra/MemtableThresholds . AFAIK the logs are
>> showing your cluster was under heavy GC pressure.
>>
>> Finally, the ActiveCount error message was a known issue in beta 2. Treat
>> yourself and try RC1 :)
>> http://www.mail-archive.com/user@cassandra.apache.org/msg06298.html
>>
>> Aaron
>>
>>
>>
>> On 02 Dec, 2010,at 12:33 AM, asakka <amin.sakka@novapost.fr> wrote:
>>
>>
>> Hello,
>>
>> I'm making some tests on a data model with 3 CF and 1 SCF, I want to start
>> by inserting 1 million rows (my target is to have 1billion rows) .
>> I have three nodes cluster (I'm using the same machines with 3GB of RAM
>> each , intel core2 duo 1,6GHZ), RF = 2, CL = 1, HEAPSIZE of the seed = 3GO
>> (it was 1.5GO, I've doubled it to avoid the heap size exception I had) ,
>> the
>> other two nodes are 1.5GO.
>>
>> I am using cassandra (V0.7.0-beta2) and Hector (V0.7.0.18) . I'm making
>> insertion in batch mode using hector Mutator.
>> My disk_access_mode is standard.
>> I reduced also my memtable_throughput_in_mb to 64, but the problem
>> persists
>> and I have the following exception :
>> I want to know if it is a configuration or hardware problem ?
>>
>> INFO [Timer-0] 2010-12-01 10:34:42,124 Gossiper.java (line 196)
>> InetAddress
>> /10.0.100.215 is now dead.
>> INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,188 Gossiper.java (line 594)
>> Node
>> /10.0.100.215 has restarted, now UP again
>> INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,189 StorageService.java (line
>> 643) Node /10.0.100.215 state jump to normal
>> INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,189 StorageService.java (line
>> 650) Will not change my token ownership to /10.0.100.215
>> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:34:44,189
>> HintedHandOffManager.java (line 196) Started hinted handoff for endpoint
>> /10.0.100.215
>> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:34:44,189
>> HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to
>> endpoint /10.0.100.215
>> INFO [GC inspection] 2010-12-01 10:40:29,141 GCInspector.java (line 129)
>> GC
>> for ParNew: 750 ms, 14693208 reclaimed leaving 2140055192 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:30,280 GCInspector.java (line 129)
>> GC
>> for ParNew: 445 ms, 17042288 reclaimed leaving 2178211008 used; max is
>> 3355312128
>> INFO [WRITE-/10.0.100.214] 2010-12-01 10:40:31,552
>> OutboundTcpConnection.java (line 115) error writing to /10.0.100.214
>> INFO [GC inspection] 2010-12-01 10:40:32,280 GCInspector.java (line 129)
>> GC
>> for ParNew: 211 ms, 25550568 reclaimed leaving 2235227312 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:34,320 GCInspector.java (line 129)
>> GC
>> for ParNew: 290 ms, 26512896 reclaimed leaving 2277013184 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:35,950 GCInspectorjava (line 129) GC
>>
>> for ParNew: 506 ms, 24319976 reclaimed leaving 2303739672 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:37,202 GCInspector.java (line 129)
>> GC
>> for ParNew: 462 ms, 31759008 reclaimed leaving 2306914712 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:42,629 GCInspector.java (line 129)
>> GC
>> for ParNew: 445 ms, 14769312 reclaimed leaving 2327064920 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:43,969 GCInspector.java (line 129)
>> GC
>> for ParNew: 720 ms, 14804208 reclaimed leaving 2366434112 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:45,372 GCInspector.java (line 129)
>> GC
>> for ParNew: 325 ms, 23112128 reclaimed leaving 2421032952 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:40:47,843 GCInspector.java (line 129)
>> GC
>> for ParNew: 801 ms, 26014296 reclaimed leaving 2474278880 used; max is
>> 3355312128
>> INFO [Timer-0] 2010-12-01 10:41:18,451 Gossiper.java (line 196)
>> InetAddress
>> /10.0.100.215 is now dead.
>> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:41:19,362
>> HintedHandOffManager.java (line 196) Started hinted handoff for endpoint
>> /10.0.100.215
>> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:41:19,975
>> HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to
>> endpoint /10.0.100.215
>> INFO [GOSSIP_STAGE:1] 2010-12-01 10:41:19,506 Gossiper.java (line 580)
>> InetAddress /10.0.100.215 is now UP
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:28,873 SSTablejava (line
>>
>> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-20-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:28,952 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-148-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,053 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-7-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,163 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-12-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,274 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-13-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,407 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-13-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,513 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-17-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,545 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-7-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,577 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-146-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,776 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-2-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,784 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-145-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,882 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-19-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,883 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-147-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,884 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-14-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,886 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-5-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,887 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-7-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,887 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-8-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,888 SSTablejava (line
>>
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-16-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,917 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-3-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,918 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-6-<>
>> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,918 SSTable.java (line
>> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-15-<>
>> INFO [GC inspection] 2010-12-01 10:51:07,737 GCInspector.java (line 129)
>> GC
>> for ParNew: 496 ms, 36105328 reclaimed leaving 274154728 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:51:43,742 GCInspector.java (line 129)
>> GC
>> for ParNew: 3386 ms, 12099384 reclaimed leaving 297145056 used; max is
>> 3355312128
>> INFO [GC inspection] 2010-12-01 10:51:45,487 GCInspector.java (line 150)
>> Pool Name Active Pending
>> INFO [GC inspection] 2010-12-01 10:51:45,706 GCInspector.java (line 156)
>> MIGRATION_STAGE 0 0
>> INFO [GC inspection] 2010-12-01 10:51:45,716 GCInspector.java (line 156)
>> GOSSIP_STAGE 0 0
>> ERROR [GC inspection] 2010-12-01 10:51:46,313 AbstractCassandraDaemon.java
>> (line 88) Fatal exception in thread Thread[GC inspection,5,main]
>> java.lang.reflect.UndeclaredThrowableException
>> at $Proxy1.getActiveCount(Unknown Source)
>> at
>>
>> org.apache.cassandra.service.GCInspector.logThreadPoolStats(GCInspector.java:156)
>> at
>>
>> org.apache.cassandra.service.GCInspector.logIntervalGCStats(GCInspector.java:136)
>> at
>> org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:39)
>> at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:93)
>> at java.util.TimerThread.mainLoop(Timer.java:512)
>> at java.util.TimerThread.run(Timer.java:462)
>> Caused by: javax.management.AttributeNotFoundException: No such attribute:
>> ActiveCount
>> at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:63)
>> at
>> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
>> at
>>
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
>> at
>>
>> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
>> at
>>
>> javax.managementMBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:263)
>>
>> ... 7 more
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/GC-Exceptions-and-cluster-nodes-are-dying-tp5791496p5791496.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>>
>
>
> --
>
> Amin SAKKA
> Research and Development Engineer
> 32 rue de Paradis, 75010 Paris
> *Tel:* +33 (0)6 34 14 19 25
> *Mail:* amin.sakka@novapost.fr
> *Web:* www.novapost.fr / www.novapost-rh.fr
>
>
>
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message