incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Williams <de...@fyrie.net>
Subject Re: too many warnings of Heap is full
Date Wed, 30 Jan 2013 23:27:53 GMT
My first guess is the 3 ParNew collections taking 35 seconds is why your
clients had long response times. That is a very abnormal stat. Combined
with the low memory usage at that time, it seems highly unlikely there was
actually 35 seconds worth of GCing going on there. Unless there is anything
interesting in the lines above that snippet.

I'd first double check that your new generation hasn't
been accidentally set to something insanely large or small, just to rule
that out (but judging from the 4G of heap that is set you are most likely
using defaults, which is good). After that I'd try replacing that node as
it could be a hardware problem.

Another reason I have seen large pauses on a linux VPS that might be worth
looking into is low entropy (possibly due to someone else abusing your
shared hardware).


On Wed, Jan 30, 2013 at 3:44 PM, Guillermo Barbero <
guillermo.barbero@spotbros.com> wrote:

> Guys,
>
> First thanks a lot for your answers.. now I have a little more info, lets
> see if this helps:
>
> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155)
> (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last
> weeks we are seing that the performace of the cluster falls.. PHP commands
> that usually lasted a few milliseconds last up to 15 seconds for just 2/3
> minutes every now and then (no specific pattern) (see commands timing in
> attached picture).
>
> Today we have found this:
>
> Prior to a fall in performance on all Cassandra nodes log I can see this:
>
> *Node 150:*
>
> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272
> serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1264710747(84317/476272
> serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db
> (84559 bytes) for commitlog position
> ReplayPosition(segmentId=1355151995930, position=32659991)
>  INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283
> CompactionTask.java (line 109) Compacting
> [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
> SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>
> *Node 151:*
> *
> *
>  INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418
> serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1453632632(93461/191418
> serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db
> (93711 bytes) for commitlog position
> ReplayPosition(segmentId=1354832000013, position=20033362)
>
> *Node 153:*
>
> * *INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line
> 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is
> 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line
> 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is
> 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line
> 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is
> 3886022656
>  INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.4195889775077013 (just-counted was 1.4195889775077013).  calculation took
> 109ms for 64 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.5276925191259334 (just-counted was 1.5276925191259334).  calculation took
> 11ms for 129 columns
>  INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.561312033901826 (just-counted was 1.561312033901826).  calculation took
> 234ms for 144 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900
> serialized/live bytes, 288 ops)
>
> *Node 154:*
> *
> *
> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
>  INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145
> serialized/live bytes, 286 ops)
>  INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live
> bytes, 286 ops)
>
>
> Node 155:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654
> serialized/live bytes, 316 ops)
>
> *Meanwhile on node 152:*
>
> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
> (639626 bytes) for commitlog position
> ReplayPosition(segmentId=1359573115280, position=11032499)
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line
> 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is
> 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line
> 57) Pool Name                    Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line
> 72) ReadStage                        32       343         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) RequestResponseStage              0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) ReadRepairStage                   0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) MutationStage                     0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) ReplicateOnWriteStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) GossipStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) AntiEntropyStage                  0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
> 72) MigrationStage                    0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
> 72) StreamStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) MemtablePostFlusher               0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) FlushWriter                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) MiscStage                         0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) commitlog_archiver                0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) InternalResponseStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) HintedHandoff                     0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 77) CompactionManager                 0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 89) MessagingService                n/a      0,24
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 99) Cache Type                     Size                 Capacity
>     KeysToSave
> Provider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 100) KeyCache                     133569                  2184533
>            all
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 106) RowCache                          0                        0
>            all
>  org.apache.cassandra.cache.SerializingCacheProvider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 113) ColumnFamily                Memtable ops,data
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 116) system.NodeIdInfo                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.IndexInfo                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.LocationInfo                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.Versions                         3,103
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.schema_keyspaces                   0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line
> 116) system.Migrations                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line
> 116) system.schema_columnfamilies                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.schema_columns                     0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.HintsColumnFamily               1,2369
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.Schema                             0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_mailPipes            9241,30380203
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_profilePics            427,1173642
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_SBVars                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_SBCounters                   5,232
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line
> 116) SB_NNCD.CF_userPhoneAgenda        7919,2332898
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_BSLUsers                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_mailTimeline                   0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_SBMessages          54390,15054201
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_NewsFeedTimeline        1866,7681861
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_BSLSpots                     1,616
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBUniqueIds                    0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBChecks                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_NFTimelines             423,533156
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBAppUsersProfilePic                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_SBLikes                     16,565
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_SBCodes                  154,93735
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_attachmentsPerUser           80,107551
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_comments                  58,77581
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_attachments             641,247796
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_SBStats                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_userReversedPhoneAgenda        9047,7132129
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_ACD.CF_GSMembers               823,1206779
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line
> 116) SB_ACD.CF_NSLoad                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line
> 116) SB_ACD.CF_sessions                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line
> 116) SB_ACD.CF_userGS                   193,577229
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
> 116) SB_ACD.CF_userApps                      8,170
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
> 116) SB_ACD.CF_GSData                     126,3503
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line
> 116) SB_ACD.CF_R2Notifications          1906,58481
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line
> 116) SB_ACD.CF_uniqueKeys                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line
> 116) SB_ACD.CF_userContactWannabes          953,438671
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line
> 116) SB_ACD.CF_SBApps                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line
> 116) SB_ACD.CF_SBAppUsers                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
> 116) SB_ACD.CF_R1Notifications        1354,2070513
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
> 116) SB_ACD.CF_SBVotes                  463,230386
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line
> 116) SB_ACD.CF_userContacts           2968,4849519
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
> 116) SB_ACD.CF_sessionsSuspended                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
> 116) SB_ACD.CF_userBlockedList               5,207
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line
> 116) SB_ACD.CF_users                65386,10852776
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line
> 145) Heap is 0.7514124302624755 full.  You may need to reduce memtable
> and/or cache sizes.  Cassandra will now flush up to the two largest
> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
> in cassandra.yaml if you don't want Cassandra to do this automatically
>  WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line
> 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to
> relieve memory pressure
>  INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355
> serialized/live bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264)
> Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live
> bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db
> (2049401 bytes) for commitlog position
> ReplayPosition(segmentId=1359573115280, position=13578602)
>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>  INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213)
> CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0
> (just-counted was 64.0).  calculation took 61ms for 8 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213)
> CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is
> 5.309650177091864 (just-counted was 5.309650177091864).  calculation took
> 11ms for 170 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 85070591730234615865843651857942052864 with IP: /10.0.0.153
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369
> serialized/live bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live
> bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db
> (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280,
> position=17323736)
>  INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java
> (line 109) Compacting
> [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
> SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>
> The cluster is perfectly balanced:
>
> Note: Ownership information does not include topology, please specify a
> keyspace.
> Address         DC          Rack        Status State   Load
>  Owns                Token
>
>                  141784319550391032739561396922763706368
> 10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB
>  16.67%              0
> 10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB
>  16.67%              28356863910078203714492389662765613056
> 10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB
> 16.67%              56713727820156407428984779325531226112
> 10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB
>  16.67%              85070591730234615865843651857942052864
> 10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB
> 16.67%              113427455640312814857969558651062452224
> 10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB
> 16.67%              141784319550391032739561396922763706368
>
>
> I guess something happens on 152 that makes other nodes believe it is
> dead, then does something to fix itself and comes back after (messing up
> performance in the meanwhile) .Does this makes any sense? Any clue of what
> might be going on? I can post more logs if you guys need
>
> Thanks again!!
>
> Guillermo
>
>
>
> 2013/1/30 Nate McCall <zznate.m@gmail.com>
>
>> What's the output of nodetool cfstats for those 2 column families on
>> cassNode2 and cassNode3? And what is the replication factor for this
>> cluster?
>>
>> Per the previous reply, nodetool ring should show each of your nodes
>> with ~16.7% of the data if well balanced.
>>
>> Also, the auto-detection for memory sizes in the startup script is a
>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>> 8g/800m will give you some more breathing room.
>>
>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <btalbot@aeriagames.com>
>> wrote:
>> > My guess is that those one or two nodes with the gc pressure also have
>> more
>> > rows in your big CF.  More rows could be due to imbalanced distribution
>> if
>> > your'e not using a random partitioner or from those nodes not yet
>> removing
>> > deleted rows which other nodes may have done.
>> >
>> > JVM heap space is used for a few things which scale with key count
>> > including:
>> > - bloom filter (for C* < 1.2)
>> > - index samples
>> >
>> > Other space is used but can be more easily controlled by tuning for
>> > - memtable
>> > - compaction
>> > - key cache
>> > - row cache
>> >
>> >
>> > So, if those nodes have more rows (check using "nodetool ring" or
>> "nodetool
>> > cfstats") than the others you can try to:
>> > - reduce the number of rows by adding nodes, run manual / tune
>> compactions
>> > to remove rows with expired tombstones, etc.
>> > - increase bloom filter fp chance
>> > - increase jvm heap size (don't go too big)
>> > - disable key or row cache
>> > - increase index sample interval
>> >
>> > Not all of those things are generally good especially to the extreme so
>> > don't go setting a 20 GB jvm heap without understanding the
>> consequences for
>> > example.
>> >
>> > -Bryan
>> >
>> >
>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>> > <guillermo.barbero@spotbros.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> >> warning messages are due to Heap is % full. According to this link
>> >>
>> >> (
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
>> )
>> >> there are two ways to "reduce pressure":
>> >> 1. Decrease the cache sizes
>> >> 2. Increase the index interval size
>> >>
>> >> Most of the flushes are in two column families (users and messages), I
>> >> guess that's because the most mutations are there.
>> >>
>> >> I still have not applied those changes to the production environment.
>> >> Do you recommend any other meassure? Should I set specific tunning for
>> >> these two CFs? Should I check another metric?
>> >>
>> >> Additionally, the distribution of warning messages is not uniform
>> >> along the cluster. Why could cassandra be doing this? What should I do
>> >> to find out how to fix this?
>> >>
>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> >> the java version is the following:
>> >> java version "1.6.0_37"
>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>> >>
>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>> >> node, class that reports the message, first word of the message)
>> >> 2013-01-26
>> >>       5 cassNode0: GCInspector.java Heap
>> >>       5 cassNode0: StorageService.java Flushing
>> >>     232 cassNode2: GCInspector.java Heap
>> >>     232 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       3 cassNode4: GCInspector.java Heap
>> >>       3 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-27
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     189 cassNode2: GCInspector.java Heap
>> >>     189 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       1 cassNode4: GCInspector.java Heap
>> >>       1 cassNode4: StorageService.java Flushing
>> >>       1 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-28
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       1 cassNode1: GCInspector.java Heap
>> >>       1 cassNode1: StorageService.java Flushing
>> >>       1 cassNode2: AutoSavingCache.java Reducing
>> >>     343 cassNode2: GCInspector.java Heap
>> >>     342 cassNode2: StorageService.java Flushing
>> >>     181 cassNode3: GCInspector.java Heap
>> >>     181 cassNode3: StorageService.java Flushing
>> >>       4 cassNode4: GCInspector.java Heap
>> >>       4 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-29
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     156 cassNode2: GCInspector.java Heap
>> >>     156 cassNode2: StorageService.java Flushing
>> >>      71 cassNode3: GCInspector.java Heap
>> >>      71 cassNode3: StorageService.java Flushing
>> >>       2 cassNode4: GCInspector.java Heap
>> >>       2 cassNode4: StorageService.java Flushing
>> >>       2 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: Memtable.java setting
>> >>       2 cassNode5: StorageService.java Flushing
>> >>
>> >> --
>> >>
>> >> Guillermo Barbero - Backend Team
>> >>
>> >> Spotbros Technologies
>> >
>> >
>> >
>>
>
>
>
> --
> <http://spotbros.com>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
> <http://www.facebook.com/spotbros>  <http://twitter.com/spotbros>
>
>


-- 
Derek Williams

Mime
View raw message