incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Larsen <...@techba.se>
Subject Re: Irresponsive nodes
Date Fri, 09 Oct 2009 12:11:13 GMT
Thanks for the tips Eric.

I was just about to try it, when I noticed, it had become responsive  
again.
It took exactly 1 hour, before it was done!...

But when I restart now, it's ready almost immediatly... Weird stuff!!

I will try out your tips, next time this happens!

It sounds like, it's pretty well-defined, when the JVM dies under GC  
load..?
Any pointers there?
I was just thinking, that it might be possible, to add nodes based on  
current knowledge?

#Dan

On 09/10/2009, at 13.58, Eric Bowman wrote:

> A few things to try:
>
> 1. Enable verbose GC logging to see if your JVM is dying under GC  
> load.
> 2. pkill -3 java will dump some nice stack traces from all running
> threads, could be some clues there.
>
>
> Dan Larsen wrote:
>> Hi again :-)
>>
>> O.k... New problem...
>> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
>> Running CommitLog on 1 disk and data on another.
>> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
>> 6GB set in the cassandra.in.sh.
>>
>> I started throwing data at it, without problems.
>> All of a sudden, the node becomes irresponsive.
>>
>> I only have 6.6GB of data in the DBs.
>>
>> I experienced the same thing, while running much smaller nodes.
>>
>> I tried restarting cassandra (kill [pid]).
>>
>> When it starts up, it goes crazy for a while, trying to fill up the
>> RAM or something.
>> Then it stops filling RAM, but keeps a load of ~100% CPU.
>> It doesn't respond to anything, but a nodeprobe info, which responds,
>> but VERY slowly.
>>
>>
>> The log doesn't give me anything - not that I can understand  
>> anyways...
>>
>> [.....]
>> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
>> Cassandra starting up...
>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>> ColumnFamilyStore.java (line 369) LocationInfo has reached its
>> threshold; switching in a fresh Memtable
>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>> ColumnFamilyStore.java (line 1178) Enqueuing flush of
>> Memtable(LocationInfo)@2116316013
>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
>> (line 186) Flushing Memtable(LocationInfo)@2116316013
>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java  
>> (line
>> 466) discard completed log segments for
>> CommitLogContext(file='/var/lib/cassandra/commitlog/ 
>> CommitLog-1255087417263.log',
>> position=257), column family 0. CFIDs are system:
>> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
>> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6,
>> SentencePages: 5, }), }
>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java  
>> (line
>> 509) Marking replay position 257 on commit log
>> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
>> (line 220) Completed flushing
>> /mnt/cassandra/data/system/LocationInfo-19-Data.db
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for  
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for  
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for  
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for  
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
>> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>>
>>
>> If I try to insert anything, I get stuff like this:
>>
>> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
>> (line 179) error writing key md5
>> java.util.concurrent.TimeoutException: Operation timed out - received
>> only 0 responses from .
>> at
>> org.apache.cassandra.service.QuorumResponseHandler.get 
>> (QuorumResponseHandler.java:88)
>>
>> at
>> org.apache.cassandra.service.StorageProxy.insertBlocking 
>> (StorageProxy.java:164)
>>
>> at
>> org.apache.cassandra.service.CassandraServer.doInsert 
>> (CassandraServer.java:468)
>>
>> at
>> org.apache.cassandra.service.CassandraServer.insert 
>> (CassandraServer.java:421)
>>
>> at
>> org.apache.cassandra.service.Cassandra$Processor$insert.process 
>> (Cassandra.java:824)
>>
>> at
>> org.apache.cassandra.service.Cassandra$Processor.process 
>> (Cassandra.java:627)
>>
>> at
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run 
>> (TThreadPoolServer.java:253)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask 
>> (ThreadPoolExecutor.java:886)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run 
>> (ThreadPoolExecutor.java:908)
>>
>> at java.lang.Thread.run(Thread.java:619)
>>
>>
>> Any ideas?
>>
>> Best regards
>> Dan
>
>
> -- 
> Eric Bowman
> Boboco Ltd
> ebowman@boboco.ie
> http://www.boboco.ie/ebowman/pubkey.pgp
> +35318394189/+353872801532
>
>


Mime
View raw message