cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Irresponsive nodes
Date Fri, 09 Oct 2009 14:21:47 GMT
if you swamp it with inserts faster than it can write them, it will
start spending more and more time trying to GC.  that's what's
happening...  trunk is smarter about this and will stop accepting
writes before it gets to that point, but for 0.4 you just need to be a
little careful.

-Jonathan

On Fri, Oct 9, 2009 at 7:11 AM, Dan Larsen <dan@techba.se> wrote:
> Thanks for the tips Eric.
>
> I was just about to try it, when I noticed, it had become responsive again.
> It took exactly 1 hour, before it was done!...
>
> But when I restart now, it's ready almost immediatly... Weird stuff!!
>
> I will try out your tips, next time this happens!
>
> It sounds like, it's pretty well-defined, when the JVM dies under GC load..?
> Any pointers there?
> I was just thinking, that it might be possible, to add nodes based on
> current knowledge?
>
> #Dan
>
> On 09/10/2009, at 13.58, Eric Bowman wrote:
>
>> A few things to try:
>>
>> 1. Enable verbose GC logging to see if your JVM is dying under GC load.
>> 2. pkill -3 java will dump some nice stack traces from all running
>> threads, could be some clues there.
>>
>>
>> Dan Larsen wrote:
>>>
>>> Hi again :-)
>>>
>>> O.k... New problem...
>>> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
>>> Running CommitLog on 1 disk and data on another.
>>> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
>>> 6GB set in the cassandra.in.sh.
>>>
>>> I started throwing data at it, without problems.
>>> All of a sudden, the node becomes irresponsive.
>>>
>>> I only have 6.6GB of data in the DBs.
>>>
>>> I experienced the same thing, while running much smaller nodes.
>>>
>>> I tried restarting cassandra (kill [pid]).
>>>
>>> When it starts up, it goes crazy for a while, trying to fill up the
>>> RAM or something.
>>> Then it stops filling RAM, but keeps a load of ~100% CPU.
>>> It doesn't respond to anything, but a nodeprobe info, which responds,
>>> but VERY slowly.
>>>
>>>
>>> The log doesn't give me anything - not that I can understand anyways...
>>>
>>> [.....]
>>> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
>>> Cassandra starting up...
>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>> ColumnFamilyStore.java (line 369) LocationInfo has reached its
>>> threshold; switching in a fresh Memtable
>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>> ColumnFamilyStore.java (line 1178) Enqueuing flush of
>>> Memtable(LocationInfo)@2116316013
>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
>>> (line 186) Flushing Memtable(LocationInfo)@2116316013
>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java (line
>>> 466) discard completed log segments for
>>>
>>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1255087417263.log',
>>> position=257), column family 0. CFIDs are system:
>>> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
>>> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6,
>>> SentencePages: 5, }), }
>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java (line
>>> 509) Marking replay position 257 on commit log
>>> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
>>> (line 220) Completed flushing
>>> /mnt/cassandra/data/system/LocationInfo-19-Data.db
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
>>> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>>
>>>
>>> If I try to insert anything, I get stuff like this:
>>>
>>> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
>>> (line 179) error writing key md5
>>> java.util.concurrent.TimeoutException: Operation timed out - received
>>> only 0 responses from .
>>> at
>>>
>>> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:164)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:468)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:421)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:824)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>
>>> at
>>>
>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>
>>> at java.lang.Thread.run(Thread.java:619)
>>>
>>>
>>> Any ideas?
>>>
>>> Best regards
>>> Dan
>>
>>
>> --
>> Eric Bowman
>> Boboco Ltd
>> ebowman@boboco.ie
>> http://www.boboco.ie/ebowman/pubkey.pgp
>> +35318394189/+353872801532
>>
>>
>
>

Mime
View raw message