Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6BB9EA6D for ; Wed, 30 Jan 2013 23:10:09 +0000 (UTC) Received: (qmail 8552 invoked by uid 500); 30 Jan 2013 23:10:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8525 invoked by uid 500); 30 Jan 2013 23:10:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 8517 invoked by uid 99); 30 Jan 2013 23:10:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 23:10:07 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zznate.m@gmail.com designates 209.85.215.43 as permitted sender) Received: from [209.85.215.43] (HELO mail-la0-f43.google.com) (209.85.215.43) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 23:10:02 +0000 Received: by mail-la0-f43.google.com with SMTP id ek20so1531276lab.30 for ; Wed, 30 Jan 2013 15:09:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=kxt4yjZms2mgeRLg4KqhcAQXYtKmWpocNaX+7QmQBbk=; b=oni0EP7TkpQ+jHwQ9gKdkflm+d0Ho/RxglSe7bX9P3fdYhaurn7Qopn2AOEdmFlPZZ y6Ki5knY4Gii/ujN+gJzfIvXL+5kXOagMUjwPMB4/QFgT2+poNERivR7PsNIoZw7dKQm JsMue38SQiYO/JMe5SU/5HFp/EgsPD+4yu7Z/rM9+EsEkp9z3jq+jfQxJaKriVEZ05oj u61XPmx9yCYtWRJ0xa0AEGcFrREAaOlRlGaYx+fCJCl63caSL+tBt/pcQdLezvHv5XuA Nvbbkn0nndqEh5H3Ziq5CUFQmuqUbbgnvN5ETN+5Y4A9hyG2UEj7pB5z2SmMSzLJbov4 p6Qg== MIME-Version: 1.0 X-Received: by 10.152.109.176 with SMTP id ht16mr6146299lab.2.1359587380481; Wed, 30 Jan 2013 15:09:40 -0800 (PST) Received: by 10.114.21.104 with HTTP; Wed, 30 Jan 2013 15:09:40 -0800 (PST) In-Reply-To: References: Date: Wed, 30 Jan 2013 15:09:40 -0800 Message-ID: Subject: Re: too many warnings of Heap is full From: Nate McCall To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Your latencies and distribution look fine. How big/what types of queries are you issuing? Are you issuing a lot of large multigets? Also, do either of these column families have secondary indexes? On Wed, Jan 30, 2013 at 2:59 PM, Guillermo Barbero wrote: > Iep, > > I missed the attachment... > > Also, these are the cfstats of 152: > > Column Family: CF_SBMessages > SSTable count: 3 > Space used (live): 967238560 > Space used (total): 967238560 > Number of Keys (estimate): 3263232 > Memtable Columns Count: 2112 > Memtable Data Size: 577135 > Memtable Switch Count: 5 > Read Count: 20702 > Read Latency: 0.181 ms. > Write Count: 19888 > Write Latency: 0.059 ms. > Pending Tasks: 0 > Bloom Filter False Postives: 14 > Bloom Filter False Ratio: 0.09375 > Bloom Filter Space Used: 6171168 > Compacted row minimum size: 536 > Compacted row maximum size: 24601 > Compacted row mean size: 850 > > Column Family: CF_users > SSTable count: 3 > Space used (live): 152205376 > Space used (total): 152205376 > Number of Keys (estimate): 343040 > Memtable Columns Count: 154398 > Memtable Data Size: 21159410 > Memtable Switch Count: 9 > Read Count: 11816 > Read Latency: 1.348 ms. > Write Count: 128751 > Write Latency: 0.090 ms. > Pending Tasks: 0 > Bloom Filter False Postives: 0 > Bloom Filter False Ratio: 0.00000 > Bloom Filter Space Used: 688464 > Compacted row minimum size: 61 > Compacted row maximum size: 3311 > Compacted row mean size: 1235 > > And the cfstats of 153: > > Column Family: CF_SBMessages > SSTable count: 5 > Space used (live): 965495648 > Space used (total): 965495648 > Number of Keys (estimate): 3257216 > Memtable Columns Count: 56541 > Memtable Data Size: 15699960 > Memtable Switch Count: 2 > Read Count: 22475 > Read Latency: 0.142 ms. > Write Count: 21719 > Write Latency: 0.073 ms. > Pending Tasks: 0 > Bloom Filter False Postives: 43 > Bloom Filter False Ratio: 0.19545 > Bloom Filter Space Used: 6161032 > Compacted row minimum size: 536 > Compacted row maximum size: 11864 > Compacted row mean size: 872 > > Column Family: CF_users > SSTable count: 2 > Space used (live): 148762893 > Space used (total): 148762893 > Number of Keys (estimate): 333312 > Memtable Columns Count: 129725 > Memtable Data Size: 17144125 > Memtable Switch Count: 4 > Read Count: 7440 > Read Latency: 1.329 ms. > Write Count: 127465 > Write Latency: 0.093 ms. > Pending Tasks: 0 > Bloom Filter False Postives: 0 > Bloom Filter False Ratio: 0.00000 > Bloom Filter Space Used: 694112 > Compacted row minimum size: 61 > Compacted row maximum size: 3311 > Compacted row mean size: 1298 > > The messages are in a keyspace with a replication factor of 2 and the > users are in a keyspace with a replication factor of 3. > > Ah, and we use the RandomPartitioner. > > Thanks again > > > 2013/1/30 Guillermo Barbero >> >> Guys, >> >> First thanks a lot for your answers.. now I have a little more info, let= s see if this helps: >> >> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155)= (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last wee= ks we are seing that the performace of the cluster falls.. PHP commands tha= t usually lasted a few milliseconds last up to 15 seconds for just 2/3 minu= tes every now and then (no specific pattern) (see commands timing in attach= ed picture). >> >> Today we have found this: >> >> Prior to a fall in performance on all Cassandra nodes log I can see this= : >> >> Node 150: >> >> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) In= etAddress /10.0.0.152 is now dead. >> INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) I= netAddress /10.0.0.152 is now UP >> INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.jav= a (line 296) Started hinted handoff for token: 5671372782015640742898477932= 5531226112 with IP: /10.0.0.152 >> INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (= line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/47= 6272 serialized/live bytes, 298 ops) >> INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264= ) Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/liv= e bytes, 298 ops) >> INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305= ) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-= HintsColumnFamily-hf-276-Data.db (84559 bytes) for commitlog position Repla= yPosition(segmentId=3D1355151995930, position=3D32659991) >> INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.j= ava (line 109) Compacting [SSTableReader(path=3D'/raid0/cassandra/data/syst= em/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'), SSTableRead= er(path=3D'/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColum= nFamily-hf-276-Data.db')] >> >> Node 151: >> >> INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) I= netAddress /10.0.0.152 is now dead. >> INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) I= netAddress /10.0.0.152 is now UP >> INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.jav= a (line 296) Started hinted handoff for token: 5671372782015640742898477932= 5531226112 with IP: /10.0.0.152 >> INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (= line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/19= 1418 serialized/live bytes, 316 ops) >> INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264= ) Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/liv= e bytes, 316 ops) >> INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305= ) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-= HintsColumnFamily-hf-266-Data.db (93711 bytes) for commitlog position Repla= yPosition(segmentId=3D1354832000013, position=3D20033362) >> >> Node 153: >> >> INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) I= netAddress /10.0.0.152 is now dead. >> INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line = 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is 38860= 22656 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line = 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is 38860= 22656 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line = 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is 38860= 22656 >> INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) C= FS(Keyspace=3D'system', ColumnFamily=3D'HintsColumnFamily') liveRatio is 1.= 4195889775077013 (just-counted was 1.4195889775077013). calculation took 1= 09ms for 64 columns >> INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) C= FS(Keyspace=3D'system', ColumnFamily=3D'HintsColumnFamily') liveRatio is 1.= 5276925191259334 (just-counted was 1.5276925191259334). calculation took 1= 1ms for 129 columns >> INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) I= netAddress /10.0.0.152 is now UP >> INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.jav= a (line 296) Started hinted handoff for token: 5671372782015640742898477932= 5531226112 with IP: /10.0.0.152 >> INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) C= FS(Keyspace=3D'system', ColumnFamily=3D'HintsColumnFamily') liveRatio is 1.= 561312033901826 (just-counted was 1.561312033901826). calculation took 234= ms for 144 columns >> INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (= line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/18= 7900 serialized/live bytes, 288 ops) >> >> Node 154: >> >> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java= (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153 >> INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) I= netAddress /10.0.0.152 is now dead. >> INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) I= netAddress /10.0.0.152 is now UP >> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.jav= a (line 296) Started hinted handoff for token: 5671372782015640742898477932= 5531226112 with IP: /10.0.0.152 >> INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (= line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185= 145 serialized/live bytes, 286 ops) >> INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264= ) Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live= bytes, 286 ops) >> >> >> Node 155: >> >> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) In= etAddress /10.0.0.152 is now dead. >> INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) I= netAddress /10.0.0.152 is now UP >> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.jav= a (line 296) Started hinted handoff for token: 5671372782015640742898477932= 5531226112 with IP: /10.0.0.152 >> INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (= line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/19= 0654 serialized/live bytes, 316 ops) >> >> Meanwhile on node 152: >> >> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) C= ompleted flushing /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF= _R2Notifications-hf-4175-Data.db (639626 bytes) for commitlog position Repl= ayPosition(segmentId=3D1359573115280, position=3D11032499) >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line = 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is 3886= 022656 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line= 57) Pool Name Active Pending Blocked >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line= 72) ReadStage 32 343 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line= 72) RequestResponseStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line= 72) ReadRepairStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line= 72) MutationStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line= 72) ReplicateOnWriteStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line= 72) GossipStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line= 72) AntiEntropyStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line= 72) MigrationStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line= 72) StreamStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line= 72) MemtablePostFlusher 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line= 72) FlushWriter 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line= 72) MiscStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line= 72) commitlog_archiver 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line= 72) InternalResponseStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line= 72) HintedHandoff 0 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line= 77) CompactionManager 0 0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line= 89) MessagingService n/a 0,24 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line= 99) Cache Type Size Capacity = KeysToSave Pro= vider >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line= 100) KeyCache 133569 2184533 = all >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line= 106) RowCache 0 0 = all org.apache.cassandra.cache.SerializingCachePr= ovider >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line= 113) ColumnFamily Memtable ops,data >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line= 116) system.NodeIdInfo 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line= 116) system.IndexInfo 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line= 116) system.LocationInfo 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line= 116) system.Versions 3,103 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line= 116) system.schema_keyspaces 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line= 116) system.Migrations 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line= 116) system.schema_columnfamilies 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line= 116) system.schema_columns 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line= 116) system.HintsColumnFamily 1,2369 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line= 116) system.Schema 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line= 116) SB_NNCD.CF_mailPipes 9241,30380203 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line= 116) SB_NNCD.CF_profilePics 427,1173642 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line= 116) SB_NNCD.CF_SBVars 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line= 116) SB_NNCD.CF_SBCounters 5,232 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line= 116) SB_NNCD.CF_userPhoneAgenda 7919,2332898 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line= 116) SB_NNCD.CF_BSLUsers 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line= 116) SB_NNCD.CF_mailTimeline 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line= 116) SB_NNCD.CF_SBMessages 54390,15054201 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line= 116) SB_NNCD.CF_NewsFeedTimeline 1866,7681861 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line= 116) SB_NNCD.CF_BSLSpots 1,616 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line= 116) SB_NNCD.CF_SBUniqueIds 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line= 116) SB_NNCD.CF_SBChecks 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line= 116) SB_NNCD.CF_NFTimelines 423,533156 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line= 116) SB_NNCD.CF_SBAppUsersProfilePic 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line= 116) SB_NNCD.CF_SBLikes 16,565 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line= 116) SB_NNCD.CF_SBCodes 154,93735 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line= 116) SB_NNCD.CF_attachmentsPerUser 80,107551 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line= 116) SB_NNCD.CF_comments 58,77581 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line= 116) SB_NNCD.CF_attachments 641,247796 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line= 116) SB_NNCD.CF_SBStats 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line= 116) SB_NNCD.CF_userReversedPhoneAgenda 9047,7132129 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line= 116) SB_ACD.CF_GSMembers 823,1206779 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line= 116) SB_ACD.CF_NSLoad 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line= 116) SB_ACD.CF_sessions 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line= 116) SB_ACD.CF_userGS 193,577229 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line= 116) SB_ACD.CF_userApps 8,170 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line= 116) SB_ACD.CF_GSData 126,3503 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line= 116) SB_ACD.CF_R2Notifications 1906,58481 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line= 116) SB_ACD.CF_uniqueKeys 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line= 116) SB_ACD.CF_userContactWannabes 953,438671 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line= 116) SB_ACD.CF_SBApps 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line= 116) SB_ACD.CF_SBAppUsers 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line= 116) SB_ACD.CF_R1Notifications 1354,2070513 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line= 116) SB_ACD.CF_SBVotes 463,230386 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line= 116) SB_ACD.CF_userContacts 2968,4849519 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line= 116) SB_ACD.CF_sessionsSuspended 0,0 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line= 116) SB_ACD.CF_userBlockedList 5,207 >> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line= 116) SB_ACD.CF_users 65386,10852776 >> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 1= 45) Heap is 0.7514124302624755 full. You may need to reduce memtable and/o= r cache sizes. Cassandra will now flush up to the two largest memtables to= free up memory. Adjust flush_largest_memtables_at threshold in cassandra.= yaml if you don't want Cassandra to do this automatically >> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (li= ne 2880) Flushing CFS(Keyspace=3D'SB_NNCD', ColumnFamily=3D'CF_mailPipes') = to relieve memory pressure >> INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java = (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790= 355 serialized/live bytes, 9380 ops) >> INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) = Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live by= tes, 9380 ops) >> INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) = Completed flushing /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_ma= ilPipes-hf-2165-Data.db (2049401 bytes) for commitlog position ReplayPositi= on(segmentId=3D1359573115280, position=3D13578602) >> WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) s= etting live ratio to maximum of 64.0 instead of 751.6512549537648 >> INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) C= FS(Keyspace=3D'SB_ACD', ColumnFamily=3D'CF_R2Notifications') liveRatio is 6= 4.0 (just-counted was 64.0). calculation took 61ms for 8 columns >> INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) C= FS(Keyspace=3D'SB_ACD', ColumnFamily=3D'CF_userGS') liveRatio is 5.30965017= 7091864 (just-counted was 5.309650177091864). calculation took 11ms for 17= 0 columns >> INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.jav= a (line 296) Started hinted handoff for token: 8507059173023461586584365185= 7942052864 with IP: /10.0.0.153 >> INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (= line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369= serialized/live bytes, 2 ops) >> INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) = Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live byte= s, 2 ops) >> INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) = Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-Hi= ntsColumnFamily-hf-122-Data.db (1016 bytes) for commitlog position ReplayPo= sition(segmentId=3D1359573115280, position=3D17323736) >> INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.jav= a (line 109) Compacting [SSTableReader(path=3D'/raid0/cassandra/data/system= /HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'), SSTableReader= (path=3D'/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnF= amily-hf-121-Data.db')] >> >> The cluster is perfectly balanced: >> >> Note: Ownership information does not include topology, please specify a = keyspace. >> Address DC Rack Status State Load O= wns Token >> = 141784319550391032739561396922763706368 >> 10.0.0.150 datacenter1 rack1 Up Normal 13.25 GB 1= 6.67% 0 >> 10.0.0.151 datacenter1 rack1 Up Normal 10.31 GB 1= 6.67% 28356863910078203714492389662765613056 >> 10.0.0.152 datacenter1 rack1 Up Normal 8.58 GB 1= 6.67% 56713727820156407428984779325531226112 >> 10.0.0.153 datacenter1 rack1 Up Normal 8.1 GB 1= 6.67% 85070591730234615865843651857942052864 >> 10.0.0.154 datacenter1 rack1 Up Normal 8.28 GB 1= 6.67% 113427455640312814857969558651062452224 >> 10.0.0.155 datacenter1 rack1 Up Normal 9.38 GB 1= 6.67% 141784319550391032739561396922763706368 >> >> >> I guess something happens on 152 that makes other nodes believe it is de= ad, then does something to fix itself and comes back after (messing up perf= ormance in the meanwhile) .Does this makes any sense? Any clue of what migh= t be going on? I can post more logs if you guys need >> >> Thanks again!! >> >> Guillermo >> >> >> >> 2013/1/30 Nate McCall >>> >>> What's the output of nodetool cfstats for those 2 column families on >>> cassNode2 and cassNode3? And what is the replication factor for this >>> cluster? >>> >>> Per the previous reply, nodetool ring should show each of your nodes >>> with ~16.7% of the data if well balanced. >>> >>> Also, the auto-detection for memory sizes in the startup script is a >>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of >>> ram. It usually ends up allocating 4g/400m (max and young) whereas >>> 8g/800m will give you some more breathing room. >>> >>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot = wrote: >>> > My guess is that those one or two nodes with the gc pressure also hav= e more >>> > rows in your big CF. More rows could be due to imbalanced distributi= on if >>> > your'e not using a random partitioner or from those nodes not yet rem= oving >>> > deleted rows which other nodes may have done. >>> > >>> > JVM heap space is used for a few things which scale with key count >>> > including: >>> > - bloom filter (for C* < 1.2) >>> > - index samples >>> > >>> > Other space is used but can be more easily controlled by tuning for >>> > - memtable >>> > - compaction >>> > - key cache >>> > - row cache >>> > >>> > >>> > So, if those nodes have more rows (check using "nodetool ring" or "no= detool >>> > cfstats") than the others you can try to: >>> > - reduce the number of rows by adding nodes, run manual / tune compac= tions >>> > to remove rows with expired tombstones, etc. >>> > - increase bloom filter fp chance >>> > - increase jvm heap size (don't go too big) >>> > - disable key or row cache >>> > - increase index sample interval >>> > >>> > Not all of those things are generally good especially to the extreme = so >>> > don't go setting a 20 GB jvm heap without understanding the consequen= ces for >>> > example. >>> > >>> > -Bryan >>> > >>> > >>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'm viewing a weird behaviour in my cassandra cluster. Most of the >>> >> warning messages are due to Heap is % full. According to this link >>> >> >>> >> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ca= ssndra-1-0-6-GC-query-tt7323457.html) >>> >> there are two ways to "reduce pressure": >>> >> 1. Decrease the cache sizes >>> >> 2. Increase the index interval size >>> >> >>> >> Most of the flushes are in two column families (users and messages),= I >>> >> guess that's because the most mutations are there. >>> >> >>> >> I still have not applied those changes to the production environment= . >>> >> Do you recommend any other meassure? Should I set specific tunning f= or >>> >> these two CFs? Should I check another metric? >>> >> >>> >> Additionally, the distribution of warning messages is not uniform >>> >> along the cluster. Why could cassandra be doing this? What should I = do >>> >> to find out how to fix this? >>> >> >>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2= ) >>> >> the java version is the following: >>> >> java version "1.6.0_37" >>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06) >>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) >>> >> >>> >> The cassandra system.log is resumed here (numer of messages, cassand= ra >>> >> node, class that reports the message, first word of the message) >>> >> 2013-01-26 >>> >> 5 cassNode0: GCInspector.java Heap >>> >> 5 cassNode0: StorageService.java Flushing >>> >> 232 cassNode2: GCInspector.java Heap >>> >> 232 cassNode2: StorageService.java Flushing >>> >> 104 cassNode3: GCInspector.java Heap >>> >> 104 cassNode3: StorageService.java Flushing >>> >> 3 cassNode4: GCInspector.java Heap >>> >> 3 cassNode4: StorageService.java Flushing >>> >> 3 cassNode5: GCInspector.java Heap >>> >> 3 cassNode5: StorageService.java Flushing >>> >> >>> >> 2013-01-27 >>> >> 2 cassNode0: GCInspector.java Heap >>> >> 2 cassNode0: StorageService.java Flushing >>> >> 3 cassNode1: GCInspector.java Heap >>> >> 3 cassNode1: StorageService.java Flushing >>> >> 189 cassNode2: GCInspector.java Heap >>> >> 189 cassNode2: StorageService.java Flushing >>> >> 104 cassNode3: GCInspector.java Heap >>> >> 104 cassNode3: StorageService.java Flushing >>> >> 1 cassNode4: GCInspector.java Heap >>> >> 1 cassNode4: StorageService.java Flushing >>> >> 1 cassNode5: GCInspector.java Heap >>> >> 1 cassNode5: StorageService.java Flushing >>> >> >>> >> 2013-01-28 >>> >> 2 cassNode0: GCInspector.java Heap >>> >> 2 cassNode0: StorageService.java Flushing >>> >> 1 cassNode1: GCInspector.java Heap >>> >> 1 cassNode1: StorageService.java Flushing >>> >> 1 cassNode2: AutoSavingCache.java Reducing >>> >> 343 cassNode2: GCInspector.java Heap >>> >> 342 cassNode2: StorageService.java Flushing >>> >> 181 cassNode3: GCInspector.java Heap >>> >> 181 cassNode3: StorageService.java Flushing >>> >> 4 cassNode4: GCInspector.java Heap >>> >> 4 cassNode4: StorageService.java Flushing >>> >> 3 cassNode5: GCInspector.java Heap >>> >> 3 cassNode5: StorageService.java Flushing >>> >> >>> >> 2013-01-29 >>> >> 2 cassNode0: GCInspector.java Heap >>> >> 2 cassNode0: StorageService.java Flushing >>> >> 3 cassNode1: GCInspector.java Heap >>> >> 3 cassNode1: StorageService.java Flushing >>> >> 156 cassNode2: GCInspector.java Heap >>> >> 156 cassNode2: StorageService.java Flushing >>> >> 71 cassNode3: GCInspector.java Heap >>> >> 71 cassNode3: StorageService.java Flushing >>> >> 2 cassNode4: GCInspector.java Heap >>> >> 2 cassNode4: StorageService.java Flushing >>> >> 2 cassNode5: GCInspector.java Heap >>> >> 1 cassNode5: Memtable.java setting >>> >> 2 cassNode5: StorageService.java Flushing >>> >> >>> >> -- >>> >> >>> >> Guillermo Barbero - Backend Team >>> >> >>> >> Spotbros Technologies >>> > >>> > >>> > >> >> >> >> >> -- >> >> Guillermo Barbero - Backend Team >> >> Spotbros Technologies >> >> >> > > > > -- > > Guillermo Barbero - Backend Team > > Spotbros Technologies