incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Bowman <ebow...@boboco.ie>
Subject Re: Irresponsive nodes
Date Fri, 09 Oct 2009 11:58:08 GMT
A few things to try:

1. Enable verbose GC logging to see if your JVM is dying under GC load.
2. pkill -3 java will dump some nice stack traces from all running
threads, could be some clues there.


Dan Larsen wrote:
> Hi again :-)
>
> O.k... New problem...
> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
> Running CommitLog on 1 disk and data on another.
> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
> 6GB set in the cassandra.in.sh.
>
> I started throwing data at it, without problems.
> All of a sudden, the node becomes irresponsive.
>
> I only have 6.6GB of data in the DBs.
>
> I experienced the same thing, while running much smaller nodes.
>
> I tried restarting cassandra (kill [pid]).
>
> When it starts up, it goes crazy for a while, trying to fill up the
> RAM or something.
> Then it stops filling RAM, but keeps a load of ~100% CPU.
> It doesn't respond to anything, but a nodeprobe info, which responds,
> but VERY slowly.
>
>
> The log doesn't give me anything - not that I can understand anyways...
>
> [.....]
> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
> Cassandra starting up...
> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
> ColumnFamilyStore.java (line 369) LocationInfo has reached its
> threshold; switching in a fresh Memtable
> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
> ColumnFamilyStore.java (line 1178) Enqueuing flush of
> Memtable(LocationInfo)@2116316013
> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
> (line 186) Flushing Memtable(LocationInfo)@2116316013
> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java (line
> 466) discard completed log segments for
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1255087417263.log',
> position=257), column family 0. CFIDs are system:
> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6,
> SentencePages: 5, }), }
> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java (line
> 509) Marking replay position 257 on commit log
> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
> (line 220) Completed flushing
> /mnt/cassandra/data/system/LocationInfo-19-Data.db
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
> 40) Disseminating load info ...
> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
> 40) Disseminating load info ...
> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
> 40) Disseminating load info ...
> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
> 40) Disseminating load info ...
>
>
> If I try to insert anything, I get stuff like this:
>
> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
> (line 179) error writing key md5
> java.util.concurrent.TimeoutException: Operation timed out - received
> only 0 responses from .
> at
> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>
> at
> org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:164)
>
> at
> org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:468)
>
> at
> org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:421)
>
> at
> org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:824)
>
> at
> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
> at java.lang.Thread.run(Thread.java:619)
>
>
> Any ideas?
>
> Best regards
> Dan


-- 
Eric Bowman
Boboco Ltd
ebowman@boboco.ie
http://www.boboco.ie/ebowman/pubkey.pgp
+35318394189/+353872801532


Mime
View raw message