incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Larsen <...@techba.se>
Subject Re: Irresponsive nodes
Date Fri, 09 Oct 2009 17:54:37 GMT
That is exactly, what I have discovered by now ;-)
Looking forward for the next release then!

Thanks Jonathan!

Best
Dan


On 09/10/2009, at 16.21, Jonathan Ellis wrote:

> if you swamp it with inserts faster than it can write them, it will
> start spending more and more time trying to GC.  that's what's
> happening...  trunk is smarter about this and will stop accepting
> writes before it gets to that point, but for 0.4 you just need to be a
> little careful.
>
> -Jonathan
>
> On Fri, Oct 9, 2009 at 7:11 AM, Dan Larsen <dan@techba.se> wrote:
>> Thanks for the tips Eric.
>>
>> I was just about to try it, when I noticed, it had become  
>> responsive again.
>> It took exactly 1 hour, before it was done!...
>>
>> But when I restart now, it's ready almost immediatly... Weird stuff!!
>>
>> I will try out your tips, next time this happens!
>>
>> It sounds like, it's pretty well-defined, when the JVM dies under  
>> GC load..?
>> Any pointers there?
>> I was just thinking, that it might be possible, to add nodes based on
>> current knowledge?
>>
>> #Dan
>>
>> On 09/10/2009, at 13.58, Eric Bowman wrote:
>>
>>> A few things to try:
>>>
>>> 1. Enable verbose GC logging to see if your JVM is dying under GC  
>>> load.
>>> 2. pkill -3 java will dump some nice stack traces from all running
>>> threads, could be some clues there.
>>>
>>>
>>> Dan Larsen wrote:
>>>>
>>>> Hi again :-)
>>>>
>>>> O.k... New problem...
>>>> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
>>>> Running CommitLog on 1 disk and data on another.
>>>> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
>>>> 6GB set in the cassandra.in.sh.
>>>>
>>>> I started throwing data at it, without problems.
>>>> All of a sudden, the node becomes irresponsive.
>>>>
>>>> I only have 6.6GB of data in the DBs.
>>>>
>>>> I experienced the same thing, while running much smaller nodes.
>>>>
>>>> I tried restarting cassandra (kill [pid]).
>>>>
>>>> When it starts up, it goes crazy for a while, trying to fill up the
>>>> RAM or something.
>>>> Then it stops filling RAM, but keeps a load of ~100% CPU.
>>>> It doesn't respond to anything, but a nodeprobe info, which  
>>>> responds,
>>>> but VERY slowly.
>>>>
>>>>
>>>> The log doesn't give me anything - not that I can understand  
>>>> anyways...
>>>>
>>>> [.....]
>>>> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
>>>> Cassandra starting up...
>>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>>> ColumnFamilyStore.java (line 369) LocationInfo has reached its
>>>> threshold; switching in a fresh Memtable
>>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>>> ColumnFamilyStore.java (line 1178) Enqueuing flush of
>>>> Memtable(LocationInfo)@2116316013
>>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039  
>>>> Memtable.java
>>>> (line 186) Flushing Memtable(LocationInfo)@2116316013
>>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java  
>>>> (line
>>>> 466) discard completed log segments for
>>>>
>>>> CommitLogContext(file='/var/lib/cassandra/commitlog/ 
>>>> CommitLog-1255087417263.log',
>>>> position=257), column family 0. CFIDs are system:
>>>> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
>>>> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4,  
>>>> WordPages: 6,
>>>> SentencePages: 5, }), }
>>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java  
>>>> (line
>>>> 509) Marking replay position 257 on commit log
>>>> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
>>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243  
>>>> Memtable.java
>>>> (line 220) Completed flushing
>>>> /mnt/cassandra/data/system/LocationInfo-19-Data.db
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for  
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for  
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for  
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for  
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> ColumnFamilyStore.java (line 1048) Expected bloom filter size :  
>>>> 1024
>>>> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>>
>>>>
>>>> If I try to insert anything, I get stuff like this:
>>>>
>>>> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574  
>>>> StorageProxy.java
>>>> (line 179) error writing key md5
>>>> java.util.concurrent.TimeoutException: Operation timed out -  
>>>> received
>>>> only 0 responses from .
>>>> at
>>>>
>>>> org.apache.cassandra.service.QuorumResponseHandler.get 
>>>> (QuorumResponseHandler.java:88)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.StorageProxy.insertBlocking 
>>>> (StorageProxy.java:164)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.CassandraServer.doInsert 
>>>> (CassandraServer.java:468)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.CassandraServer.insert 
>>>> (CassandraServer.java:421)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.Cassandra$Processor$insert.process 
>>>> (Cassandra.java:824)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.Cassandra$Processor.process 
>>>> (Cassandra.java:627)
>>>>
>>>> at
>>>>
>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run 
>>>> (TThreadPoolServer.java:253)
>>>>
>>>> at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask 
>>>> (ThreadPoolExecutor.java:886)
>>>>
>>>> at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run 
>>>> (ThreadPoolExecutor.java:908)
>>>>
>>>> at java.lang.Thread.run(Thread.java:619)
>>>>
>>>>
>>>> Any ideas?
>>>>
>>>> Best regards
>>>> Dan
>>>
>>>
>>> --
>>> Eric Bowman
>>> Boboco Ltd
>>> ebowman@boboco.ie
>>> http://www.boboco.ie/ebowman/pubkey.pgp
>>> +35318394189/+353872801532
>>>
>>>
>>
>>
>


Mime
View raw message