incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Stream fails during repair, two nodes out-of-memory
Date Thu, 21 Mar 2013 17:28:19 GMT
> heap of 1867M is kind of small. According to the discussion on this list, it's advisable
to have m1.xlarge.  
+1

In cassadrea-env.sh set the MAX_HEAP_SIZE to 4GB, and the NEW_HEAP_SIZE to 400M

In the yaml file set

in_memory_compaction_limit_in_mb to 32
compaction_throughput_mb_per_sec to 8 
concurrent_compactors to 2

This will slow down compaction a lot. You may want to restore some of these settings once
you have things stable. 

You have an under powered box for what you are trying to do. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/03/2013, at 4:47 PM, Wei Zhu <wz1975@yahoo.com> wrote:

> It's clear you are out of memory. How big is your data size? 
> heap of 1867M is kind of small. According to the discussion on this list, it's advisable
to have m1.xlarge.  
> 
> Attached please find the related thread.
> 
> -Wei
> 
> ----- Original Message -----
> From: "Dane Miller" <dane@optimalsocial.com>
> To: user@cassandra.apache.org
> Sent: Wednesday, March 20, 2013 7:13:44 PM
> Subject: Stream fails during repair, two nodes out-of-memory
> 
> After having just solved one repair problem, I immediately hit
> another.  Again, much appreciation for suggestions...
> 
> I'm having problems repairing a CF, and the failure consistenly brings
> down 2 of the 6 nodes in the cluster.  I'm running "repair -pr" on a
> single CF on node2, the repair starts streaming, and after about 60
> seconds both node2 and node4 crash with java.lang.OutOfMemoryError.
> The keyspace has rf=3 and is being actively written to by our
> application.
> 
> The abbrieviated logs below show the pattern, after which I kill -9
> and restart cassandra on the two nodes.  What extra info should I
> include?  I'm kind of overwhelmed by the volume of logs being
> generated and not sure what is signal vs noise.  I'm especially seeing
> big repeating sections of StatusLogger and FlushWriter/Memtable.
> 
> Details:
> 6 node cluster
> cassandra  1.2.2 - single token per node
> RandomPartitioner, EC2Snitch
> Replication: SimpleStrategy, rf=3
> Ubuntu 10.10 x86_64
> EC2 m1.large
> Cassandra max heap: 1867M
> 
> 
> node2 (abbrieviated logs)
> 
> ERROR 21:11:22 AbstractStreamSession.java Stream failed because [node4] died
> GC for ConcurrentMarkSweep: 2365 ms for 2 collections, 1913603168
> used; max is 1937768448
> Pool Name                    Active   Pending   Blocked
> ReadStage                         7         7         0
> RequestResponseStage              0         0         0
> ReadRepairStage                   0         0         0
> MutationStage                    32      4707         0
> ReplicateOnWriteStage             0         0         0
> GossipStage                       0         0         0
> AntiEntropyStage                  0         0         0
> MigrationStage                    0         0         0
> MemtablePostFlusher               1         1         0
> FlushWriter                       1         1         0
> MiscStage                         0         0         0
> commitlog_archiver                0         0         0
> InternalResponseStage             0         0         0
> AntiEntropySessions               1         1         0
> HintedHandoff                     0         0         0
> CompactionManager                 1        21
> MessagingService                n/a    291,35
> WARN  21:12:52 GCInspector.java Heap is 0.9875293252788064 full
> INFO  21:12:52 Gossiper.java InetAddress [node5] is now dead.
> INFO  21:12:52 Gossiper.java InetAddress [node1] is now dead.
> INFO  21:12:52 Gossiper.java InetAddress [node6] is now dead.
> INFO  21:12:52 ColumnFamilyStore.java Enqueuing flush of Memtable-[MyCF]@...
> INFO  21:12:52 MessagingService.java 4415 MUTATION messages dropped in
> last 5000ms
> INFO  21:12:52 Gossiper.java InetAddress [node5] is now UP
> INFO  21:12:52 Gossiper.java InetAddress [node1] is now UP
> INFO  21:12:52 Gossiper.java InetAddress [node6] is now UP
> INFO  21:12:52 HintedHandOffManager.java Started hinted handoff for
> host: [node5]
> INFO  21:12:52 HintedHandOffManager.java Started hinted handoff for
> host: [node1]
> ERROR 21:12:56 CassandraDaemon.java java.lang.OutOfMemoryError: Java heap space
> (full OutOfMemory stack trace is included at bottom)
> 
> node4 (abbrieviated logs)
> 
> INFO 21:10:05 StreamOutSession.java Streaming to [node2]
> INFO 21:10:14 CompactionTask.java Compacted 4 sstables to [MyCF-ib-17665]
> INFO 21:10:24 StreamReplyVerbHandler.java Successfully sent
> [MyCF]-ib-17647-Data.db to [node2]
> INFO 21:10:24 GCInspector.java GC for ConcurrentMarkSweep
> GC for ConcurrentMarkSweep: 764 ms for 3 collections, 1408393640 used;
> max is 1937768448
> GC for ConcurrentMarkSweep: 2198 ms for 2 collections, 1882942392
> used; max is 1937768448
> Pool Name                    Active   Pending   Blocked
> ReadStage                         5         5         0
> RequestResponseStage              0        20         0
> ReadRepairStage                   0         0         0
> MutationStage                     0         0         0
> ReplicateOnWriteStage             0         0         0
> GossipStage                       0         8         0
> AntiEntropyStage                  0         0         0
> MigrationStage                    0         0         0
> MemtablePostFlusher               0         0         0
> FlushWriter                       0         0         0
> MiscStage                         0         0         0
> commitlog_archiver                0         0         0
> InternalResponseStage             0         0         0
> AntiEntropySessions               0         0         0
> HintedHandoff                     1         1         0
> CompactionManager                 0         6
> MessagingService                n/a     10,15
> INFO 21:11:35 Gossiper.java InetAddress [node5] is now dead.
> INFO 21:11:35 Gossiper.java InetAddress [node2] is now dead.
> ERROR 21:13:17 CassandraDaemon.java java.lang.OutOfMemoryError: Java heap space
> (full OutOfMemory stack trace is included at bottom)
> 
> 
> 
> 
> node2 full OOM stack trace:
> 
> ERROR [Thread-417] 2013-03-20 21:12:56,114 CassandraDaemon.java (line
> 133) Exception in thread Thread[Thread-417,5,main]
> java.lang.OutOfMemoryError: Java heap space
>        at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:76)
>        at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
>        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:114)
>        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:101)
>        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:40)
>        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:31)
>        at org.apache.cassandra.db.ColumnIndex$Builder.<init>(ColumnIndex.java:74)
>        at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:243)
>        at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
>        at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>        at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>        at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> 
> node4 full OOM stack trace:
> 
> ERROR [Thread-326] 2013-03-20 21:13:22,829 CassandraDaemon.java (line
> 133) Exception in thread Thread[Thread-326,5,main]
> java.lang.OutOfMemoryError: Java heap space
>        at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:76)
>        at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
>        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:114)
>        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:101)
>        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:40)
>        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:31)
>        at org.apache.cassandra.db.ColumnIndex$Builder.<init>(ColumnIndex.java:74)
>        at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:243)
>        at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
>        at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>        at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>        at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> 
> 
> Dane
> <attachment.eml>


Mime
View raw message