From cassandra-user-return-889-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Fri Oct 09 17:55:12 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 99492 invoked from network); 9 Oct 2009 17:55:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Oct 2009 17:55:12 -0000 Received: (qmail 815 invoked by uid 500); 9 Oct 2009 17:55:11 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 800 invoked by uid 500); 9 Oct 2009 17:55:11 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 791 invoked by uid 99); 9 Oct 2009 17:55:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2009 17:55:11 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [90.185.1.42] (HELO smtp.fullrate.dk) (90.185.1.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2009 17:54:59 +0000 Received: from [192.168.100.201] (2805ds4-fb.0.fullrate.dk [90.184.160.23]) by smtp.fullrate.dk (Postfix) with ESMTP id 888A19D013 for ; Fri, 9 Oct 2009 19:54:38 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Mime-Version: 1.0 (Apple Message framework v1076) Subject: Re: Irresponsive nodes From: Dan Larsen In-Reply-To: Date: Fri, 9 Oct 2009 19:54:37 +0200 Content-Transfer-Encoding: 7bit Message-Id: <0F18FA69-7A70-4370-B37C-7AEE4B20C051@techba.se> References: <6278F8D7-53DC-492C-9DEB-D1FF20CFAB87@techba.se> <4ACF2550.1000704@boboco.ie> <87EA9412-3551-49A5-A712-D909466339E6@techba.se> To: cassandra-user@incubator.apache.org X-Mailer: Apple Mail (2.1076) X-Virus-Checked: Checked by ClamAV on apache.org That is exactly, what I have discovered by now ;-) Looking forward for the next release then! Thanks Jonathan! Best Dan On 09/10/2009, at 16.21, Jonathan Ellis wrote: > if you swamp it with inserts faster than it can write them, it will > start spending more and more time trying to GC. that's what's > happening... trunk is smarter about this and will stop accepting > writes before it gets to that point, but for 0.4 you just need to be a > little careful. > > -Jonathan > > On Fri, Oct 9, 2009 at 7:11 AM, Dan Larsen wrote: >> Thanks for the tips Eric. >> >> I was just about to try it, when I noticed, it had become >> responsive again. >> It took exactly 1 hour, before it was done!... >> >> But when I restart now, it's ready almost immediatly... Weird stuff!! >> >> I will try out your tips, next time this happens! >> >> It sounds like, it's pretty well-defined, when the JVM dies under >> GC load..? >> Any pointers there? >> I was just thinking, that it might be possible, to add nodes based on >> current knowledge? >> >> #Dan >> >> On 09/10/2009, at 13.58, Eric Bowman wrote: >> >>> A few things to try: >>> >>> 1. Enable verbose GC logging to see if your JVM is dying under GC >>> load. >>> 2. pkill -3 java will dump some nice stack traces from all running >>> threads, could be some clues there. >>> >>> >>> Dan Larsen wrote: >>>> >>>> Hi again :-) >>>> >>>> O.k... New problem... >>>> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM. >>>> Running CommitLog on 1 disk and data on another. >>>> Cassandra 0.4.0 - (yes I have checked... correct version :-P) >>>> 6GB set in the cassandra.in.sh. >>>> >>>> I started throwing data at it, without problems. >>>> All of a sudden, the node becomes irresponsive. >>>> >>>> I only have 6.6GB of data in the DBs. >>>> >>>> I experienced the same thing, while running much smaller nodes. >>>> >>>> I tried restarting cassandra (kill [pid]). >>>> >>>> When it starts up, it goes crazy for a while, trying to fill up the >>>> RAM or something. >>>> Then it stops filling RAM, but keeps a load of ~100% CPU. >>>> It doesn't respond to anything, but a nodeprobe info, which >>>> responds, >>>> but VERY slowly. >>>> >>>> >>>> The log doesn't give me anything - not that I can understand >>>> anyways... >>>> >>>> [.....] >>>> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142) >>>> Cassandra starting up... >>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239 >>>> ColumnFamilyStore.java (line 369) LocationInfo has reached its >>>> threshold; switching in a fresh Memtable >>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239 >>>> ColumnFamilyStore.java (line 1178) Enqueuing flush of >>>> Memtable(LocationInfo)@2116316013 >>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 >>>> Memtable.java >>>> (line 186) Flushing Memtable(LocationInfo)@2116316013 >>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java >>>> (line >>>> 466) discard completed log segments for >>>> >>>> CommitLogContext(file='/var/lib/cassandra/commitlog/ >>>> CommitLog-1255087417263.log', >>>> position=257), column family 0. CFIDs are system: >>>> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher: >>>> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, >>>> WordPages: 6, >>>> SentencePages: 5, }), } >>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java >>>> (line >>>> 509) Marking replay position 257 on commit log >>>> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log >>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 >>>> Memtable.java >>>> (line 220) Completed flushing >>>> /mnt/cassandra/data/system/LocationInfo-19-Data.db >>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 >>>> SSTableReader.java (line 58) index size for bloom filter calc for >>>> file >>>> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256 >>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 >>>> SSTableReader.java (line 58) index size for bloom filter calc for >>>> file >>>> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512 >>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 >>>> SSTableReader.java (line 58) index size for bloom filter calc for >>>> file >>>> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768 >>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 >>>> SSTableReader.java (line 58) index size for bloom filter calc for >>>> file >>>> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024 >>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228 >>>> ColumnFamilyStore.java (line 1048) Expected bloom filter size : >>>> 1024 >>>> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line >>>> 40) Disseminating load info ... >>>> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line >>>> 40) Disseminating load info ... >>>> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line >>>> 40) Disseminating load info ... >>>> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line >>>> 40) Disseminating load info ... >>>> >>>> >>>> If I try to insert anything, I get stuff like this: >>>> >>>> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 >>>> StorageProxy.java >>>> (line 179) error writing key md5 >>>> java.util.concurrent.TimeoutException: Operation timed out - >>>> received >>>> only 0 responses from . >>>> at >>>> >>>> org.apache.cassandra.service.QuorumResponseHandler.get >>>> (QuorumResponseHandler.java:88) >>>> >>>> at >>>> >>>> org.apache.cassandra.service.StorageProxy.insertBlocking >>>> (StorageProxy.java:164) >>>> >>>> at >>>> >>>> org.apache.cassandra.service.CassandraServer.doInsert >>>> (CassandraServer.java:468) >>>> >>>> at >>>> >>>> org.apache.cassandra.service.CassandraServer.insert >>>> (CassandraServer.java:421) >>>> >>>> at >>>> >>>> org.apache.cassandra.service.Cassandra$Processor$insert.process >>>> (Cassandra.java:824) >>>> >>>> at >>>> >>>> org.apache.cassandra.service.Cassandra$Processor.process >>>> (Cassandra.java:627) >>>> >>>> at >>>> >>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run >>>> (TThreadPoolServer.java:253) >>>> >>>> at >>>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask >>>> (ThreadPoolExecutor.java:886) >>>> >>>> at >>>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run >>>> (ThreadPoolExecutor.java:908) >>>> >>>> at java.lang.Thread.run(Thread.java:619) >>>> >>>> >>>> Any ideas? >>>> >>>> Best regards >>>> Dan >>> >>> >>> -- >>> Eric Bowman >>> Boboco Ltd >>> ebowman@boboco.ie >>> http://www.boboco.ie/ebowman/pubkey.pgp >>> +35318394189/+353872801532 >>> >>> >> >> >