cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amin Sakka, Novapost" <amin.sa...@novapost.fr>
Subject Re: GC Exceptions and cluster nodes are dying
Date Wed, 08 Dec 2010 15:18:52 GMT
Thanks for your answer Aaron,

I'm now on the RC1, I have no longer the ActiveCount error, however my nodes
still dying under bulk insertion.

I have modified my nodes configuration (all of them has now 2GB Heap size).
The nodes still under heavy pressure and they dies after a random timeout
(sometimes after 10 minutes of insertion
and sometimes after 50 minutes).
I want to point that I'm inserting rows in 4 different columns families at
the same time and that the rows size is too little (few KiloBytes).
I've attached here my cassandra.yaml configuration file.
Can you help me please to solve this issue?

Thanks!

Here is some of my log output:

DEBUG [MutationStage:27] 2010-12-08 15:31:19,214 RowMutationVerbHandler.java
(line 78) RowMutation(keyspace='SAE',
key='6163636f756e743936353a726566393230',
modifications=[ColumnFamily(Document
[6465736372697074696f6e:false:49@1291818633293000
,646f63756d656e744964:false:36@1291818633293000
,7265666572656e6365:false:17@1291818633293000,])]) applied.  Sending
response to 714546@/10.0.100.94
DEBUG [MutationStage:7] 2010-12-08 15:31:19,214 RowMutationVerbHandler.java
(line 54) Applying RowMutation(keyspace='SAE',
key='6163636f756e743938373a726566313835',
modifications=[ColumnFamily(Document
[6465736372697074696f6e:false:49@1291818633524000
,646f63756d656e744964:false:36@1291818633524000
,7265666572656e6365:false:17@1291818633524000,])])
DEBUG [MutationStage:7] 2010-12-08 15:31:19,214 Table.java (line 378)
applying mutation of row 6163636f756e743938373a726566313835
DEBUG [MutationStage:7] 2010-12-08 15:31:19,214 RowMutationVerbHandler.java
(line 78) RowMutation(keyspace='SAE',
key='6163636f756e743938373a726566313835',
modifications=[ColumnFamily(Document
[6465736372697074696f6e:false:49@1291818633524000
,646f63756d656e744964:false:36@1291818633524000
,7265666572656e6365:false:17@1291818633524000,])]) applied.  Sending
response to 714547@/10.0.100.94
*DEBUG [ScheduledTasks:1] 2010-12-08 15:31:19,373 GCInspector.java (line
135) GC for ParNew: 13 ms, 34456296 reclaimed leaving 1465917704 used; max
is 2256404480*





2010/12/1 Aaron Morton <aaron@thelastpickle.com>

> Running nodes with different JVM heap sizes would not be recommended
> practice, for many reasons. Nor would I recommend running them with all the
> memory the machine has, it will just lead to the OS swapping the JVM out to
> disk and considerable slow things down.
>
> I would suggest a heap size of 1.5 or 2.0 GB for each node, and have a read
> of the JVM Heap Size section here
> http://wiki.apache.org/cassandra/MemtableThresholds . AFAIK the logs are
> showing your cluster was under heavy GC pressure.
>
> Finally, the ActiveCount error message was a known issue in beta 2. Treat
> yourself and try RC1 :)
> http://www.mail-archive.com/user@cassandra.apache.org/msg06298.html
>
> Aaron
>
>
>
> On 02 Dec, 2010,at 12:33 AM, asakka <amin.sakka@novapost.fr> wrote:
>
>
> Hello,
>
> I'm making some tests on a data model with 3 CF and 1 SCF, I want to start
> by inserting 1 million rows (my target is to have 1billion rows) .
> I have three nodes cluster (I'm using the same machines with 3GB of RAM
> each , intel core2 duo 1,6GHZ), RF = 2, CL = 1, HEAPSIZE of the seed = 3GO
> (it was 1.5GO, I've doubled it to avoid the heap size exception I had) ,
> the
> other two nodes are 1.5GO.
>
> I am using cassandra (V0.7.0-beta2) and Hector (V0.7.0.18) . I'm making
> insertion in batch mode using hector Mutator.
> My disk_access_mode is standard.
> I reduced also my memtable_throughput_in_mb to 64, but the problem persists
> and I have the following exception :
> I want to know if it is a configuration or hardware problem ?
>
> INFO [Timer-0] 2010-12-01 10:34:42,124 Gossiper.java (line 196) InetAddress
> /10.0.100.215 is now dead.
> INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,188 Gossiper.java (line 594) Node
> /10.0.100.215 has restarted, now UP again
> INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,189 StorageService.java (line
> 643) Node /10.0.100.215 state jump to normal
> INFO [GOSSIP_STAGE:1] 2010-12-01 10:34:44,189 StorageService.java (line
> 650) Will not change my token ownership to /10.0.100.215
> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:34:44,189
> HintedHandOffManager.java (line 196) Started hinted handoff for endpoint
> /10.0.100.215
> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:34:44,189
> HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to
> endpoint /10.0.100.215
> INFO [GC inspection] 2010-12-01 10:40:29,141 GCInspector.java (line 129) GC
> for ParNew: 750 ms, 14693208 reclaimed leaving 2140055192 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:30,280 GCInspector.java (line 129) GC
> for ParNew: 445 ms, 17042288 reclaimed leaving 2178211008 used; max is
> 3355312128
> INFO [WRITE-/10.0.100.214] 2010-12-01 10:40:31,552
> OutboundTcpConnection.java (line 115) error writing to /10.0.100.214
> INFO [GC inspection] 2010-12-01 10:40:32,280 GCInspector.java (line 129) GC
> for ParNew: 211 ms, 25550568 reclaimed leaving 2235227312 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:34,320 GCInspector.java (line 129) GC
> for ParNew: 290 ms, 26512896 reclaimed leaving 2277013184 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:35,950 GCInspectorjava (line 129) GC
>
> for ParNew: 506 ms, 24319976 reclaimed leaving 2303739672 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:37,202 GCInspector.java (line 129) GC
> for ParNew: 462 ms, 31759008 reclaimed leaving 2306914712 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:42,629 GCInspector.java (line 129) GC
> for ParNew: 445 ms, 14769312 reclaimed leaving 2327064920 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:43,969 GCInspector.java (line 129) GC
> for ParNew: 720 ms, 14804208 reclaimed leaving 2366434112 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:45,372 GCInspector.java (line 129) GC
> for ParNew: 325 ms, 23112128 reclaimed leaving 2421032952 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:40:47,843 GCInspector.java (line 129) GC
> for ParNew: 801 ms, 26014296 reclaimed leaving 2474278880 used; max is
> 3355312128
> INFO [Timer-0] 2010-12-01 10:41:18,451 Gossiper.java (line 196) InetAddress
> /10.0.100.215 is now dead.
> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:41:19,362
> HintedHandOffManager.java (line 196) Started hinted handoff for endpoint
> /10.0.100.215
> INFO [HINTED-HANDOFF-POOL:1] 2010-12-01 10:41:19,975
> HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to
> endpoint /10.0.100.215
> INFO [GOSSIP_STAGE:1] 2010-12-01 10:41:19,506 Gossiper.java (line 580)
> InetAddress /10.0.100.215 is now UP
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:28,873 SSTablejava (line
>
> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-20-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:28,952 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-148-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,053 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-7-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,163 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-12-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,274 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-13-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,407 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-13-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,513 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-17-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,545 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-7-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,577 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-146-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,776 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-2-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,784 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-145-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,882 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Document-e-19-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,883 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/system/LocationInfo-e-147-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,884 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-14-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,886 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-5-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,887 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-7-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,887 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-8-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,888 SSTablejava (line
>
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-16-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,917 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-3-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,918 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/DocumentByFolder-e-6-<>
> INFO [SSTABLE-CLEANUP-TIMER] 2010-12-01 10:41:29,918 SSTable.java (line
> 145) Deleted /var/lib/cassandra/data/SAE4/Account-e-15-<>
> INFO [GC inspection] 2010-12-01 10:51:07,737 GCInspector.java (line 129) GC
> for ParNew: 496 ms, 36105328 reclaimed leaving 274154728 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:51:43,742 GCInspector.java (line 129) GC
> for ParNew: 3386 ms, 12099384 reclaimed leaving 297145056 used; max is
> 3355312128
> INFO [GC inspection] 2010-12-01 10:51:45,487 GCInspector.java (line 150)
> Pool Name Active Pending
> INFO [GC inspection] 2010-12-01 10:51:45,706 GCInspector.java (line 156)
> MIGRATION_STAGE 0 0
> INFO [GC inspection] 2010-12-01 10:51:45,716 GCInspector.java (line 156)
> GOSSIP_STAGE 0 0
> ERROR [GC inspection] 2010-12-01 10:51:46,313 AbstractCassandraDaemon.java
> (line 88) Fatal exception in thread Thread[GC inspection,5,main]
> java.lang.reflect.UndeclaredThrowableException
> at $Proxy1.getActiveCount(Unknown Source)
> at
>
> org.apache.cassandra.service.GCInspector.logThreadPoolStats(GCInspector.java:156)
> at
>
> org.apache.cassandra.service.GCInspector.logIntervalGCStats(GCInspector.java:136)
> at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:39)
> at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:93)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> Caused by: javax.management.AttributeNotFoundException: No such attribute:
> ActiveCount
> at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:63)
> at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
> at
>
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
> at
>
> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
> at
>
> javax.managementMBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:263)
>
> ... 7 more
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/GC-Exceptions-and-cluster-nodes-are-dying-tp5791496p5791496.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>
>


-- 

Amin SAKKA
Research and Development Engineer
32 rue de Paradis, 75010 Paris
*Tel:* +33 (0)6 34 14 19 25
*Mail:* amin.sakka@novapost.fr
*Web:* www.novapost.fr / www.novapost-rh.fr

Mime
View raw message