incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zhu <wz1...@yahoo.com>
Subject Re: Stream fails during repair, two nodes out-of-memory
Date Thu, 21 Mar 2013 03:47:35 GMT
It's clear you are out of memory. How big is your data size? 
heap of 1867M is kind of small. According to the discussion on this list, it's advisable to
have m1.xlarge.  

Attached please find the related thread.

-Wei

----- Original Message -----
From: "Dane Miller" <dane@optimalsocial.com>
To: user@cassandra.apache.org
Sent: Wednesday, March 20, 2013 7:13:44 PM
Subject: Stream fails during repair, two nodes out-of-memory

After having just solved one repair problem, I immediately hit
another.  Again, much appreciation for suggestions...

I'm having problems repairing a CF, and the failure consistenly brings
down 2 of the 6 nodes in the cluster.  I'm running "repair -pr" on a
single CF on node2, the repair starts streaming, and after about 60
seconds both node2 and node4 crash with java.lang.OutOfMemoryError.
The keyspace has rf=3 and is being actively written to by our
application.

The abbrieviated logs below show the pattern, after which I kill -9
and restart cassandra on the two nodes.  What extra info should I
include?  I'm kind of overwhelmed by the volume of logs being
generated and not sure what is signal vs noise.  I'm especially seeing
big repeating sections of StatusLogger and FlushWriter/Memtable.

Details:
6 node cluster
cassandra  1.2.2 - single token per node
RandomPartitioner, EC2Snitch
Replication: SimpleStrategy, rf=3
Ubuntu 10.10 x86_64
EC2 m1.large
Cassandra max heap: 1867M


node2 (abbrieviated logs)

ERROR 21:11:22 AbstractStreamSession.java Stream failed because [node4] died
GC for ConcurrentMarkSweep: 2365 ms for 2 collections, 1913603168
used; max is 1937768448
Pool Name                    Active   Pending   Blocked
ReadStage                         7         7         0
RequestResponseStage              0         0         0
ReadRepairStage                   0         0         0
MutationStage                    32      4707         0
ReplicateOnWriteStage             0         0         0
GossipStage                       0         0         0
AntiEntropyStage                  0         0         0
MigrationStage                    0         0         0
MemtablePostFlusher               1         1         0
FlushWriter                       1         1         0
MiscStage                         0         0         0
commitlog_archiver                0         0         0
InternalResponseStage             0         0         0
AntiEntropySessions               1         1         0
HintedHandoff                     0         0         0
CompactionManager                 1        21
MessagingService                n/a    291,35
WARN  21:12:52 GCInspector.java Heap is 0.9875293252788064 full
INFO  21:12:52 Gossiper.java InetAddress [node5] is now dead.
INFO  21:12:52 Gossiper.java InetAddress [node1] is now dead.
INFO  21:12:52 Gossiper.java InetAddress [node6] is now dead.
INFO  21:12:52 ColumnFamilyStore.java Enqueuing flush of Memtable-[MyCF]@...
INFO  21:12:52 MessagingService.java 4415 MUTATION messages dropped in
last 5000ms
INFO  21:12:52 Gossiper.java InetAddress [node5] is now UP
INFO  21:12:52 Gossiper.java InetAddress [node1] is now UP
INFO  21:12:52 Gossiper.java InetAddress [node6] is now UP
INFO  21:12:52 HintedHandOffManager.java Started hinted handoff for
host: [node5]
INFO  21:12:52 HintedHandOffManager.java Started hinted handoff for
host: [node1]
ERROR 21:12:56 CassandraDaemon.java java.lang.OutOfMemoryError: Java heap space
(full OutOfMemory stack trace is included at bottom)

node4 (abbrieviated logs)

INFO 21:10:05 StreamOutSession.java Streaming to [node2]
INFO 21:10:14 CompactionTask.java Compacted 4 sstables to [MyCF-ib-17665]
INFO 21:10:24 StreamReplyVerbHandler.java Successfully sent
[MyCF]-ib-17647-Data.db to [node2]
INFO 21:10:24 GCInspector.java GC for ConcurrentMarkSweep
GC for ConcurrentMarkSweep: 764 ms for 3 collections, 1408393640 used;
max is 1937768448
GC for ConcurrentMarkSweep: 2198 ms for 2 collections, 1882942392
used; max is 1937768448
Pool Name                    Active   Pending   Blocked
ReadStage                         5         5         0
RequestResponseStage              0        20         0
ReadRepairStage                   0         0         0
MutationStage                     0         0         0
ReplicateOnWriteStage             0         0         0
GossipStage                       0         8         0
AntiEntropyStage                  0         0         0
MigrationStage                    0         0         0
MemtablePostFlusher               0         0         0
FlushWriter                       0         0         0
MiscStage                         0         0         0
commitlog_archiver                0         0         0
InternalResponseStage             0         0         0
AntiEntropySessions               0         0         0
HintedHandoff                     1         1         0
CompactionManager                 0         6
MessagingService                n/a     10,15
INFO 21:11:35 Gossiper.java InetAddress [node5] is now dead.
INFO 21:11:35 Gossiper.java InetAddress [node2] is now dead.
ERROR 21:13:17 CassandraDaemon.java java.lang.OutOfMemoryError: Java heap space
(full OutOfMemory stack trace is included at bottom)




node2 full OOM stack trace:

ERROR [Thread-417] 2013-03-20 21:12:56,114 CassandraDaemon.java (line
133) Exception in thread Thread[Thread-417,5,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:76)
        at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:114)
        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:101)
        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:40)
        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:31)
        at org.apache.cassandra.db.ColumnIndex$Builder.<init>(ColumnIndex.java:74)
        at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:243)
        at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
        at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
        at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
        at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

node4 full OOM stack trace:

ERROR [Thread-326] 2013-03-20 21:13:22,829 CassandraDaemon.java (line
133) Exception in thread Thread[Thread-326,5,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:76)
        at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143)
        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:114)
        at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:101)
        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:40)
        at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:31)
        at org.apache.cassandra.db.ColumnIndex$Builder.<init>(ColumnIndex.java:74)
        at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:243)
        at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
        at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
        at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
        at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


Dane

Mime
View raw message