incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Barbero <guillermo.barb...@spotbros.com>
Subject Re: too many warnings of Heap is full
Date Wed, 30 Jan 2013 22:59:55 GMT
Iep,

  I missed the attachment...

  Also, these are the cfstats of 152:

Column Family: CF_SBMessages
SSTable count: 3
Space used (live): 967238560
Space used (total): 967238560
Number of Keys (estimate): 3263232
Memtable Columns Count: 2112
Memtable Data Size: 577135
Memtable Switch Count: 5
Read Count: 20702
Read Latency: 0.181 ms.
Write Count: 19888
Write Latency: 0.059 ms.
Pending Tasks: 0
Bloom Filter False Postives: 14
Bloom Filter False Ratio: 0.09375
Bloom Filter Space Used: 6171168
Compacted row minimum size: 536
Compacted row maximum size: 24601
Compacted row mean size: 850

Column Family: CF_users
SSTable count: 3
Space used (live): 152205376
Space used (total): 152205376
Number of Keys (estimate): 343040
Memtable Columns Count: 154398
Memtable Data Size: 21159410
Memtable Switch Count: 9
Read Count: 11816
Read Latency: 1.348 ms.
Write Count: 128751
Write Latency: 0.090 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 688464
Compacted row minimum size: 61
Compacted row maximum size: 3311
Compacted row mean size: 1235

And the cfstats of 153:

Column Family: CF_SBMessages
SSTable count: 5
Space used (live): 965495648
Space used (total): 965495648
Number of Keys (estimate): 3257216
Memtable Columns Count: 56541
Memtable Data Size: 15699960
Memtable Switch Count: 2
Read Count: 22475
Read Latency: 0.142 ms.
Write Count: 21719
Write Latency: 0.073 ms.
Pending Tasks: 0
Bloom Filter False Postives: 43
Bloom Filter False Ratio: 0.19545
Bloom Filter Space Used: 6161032
Compacted row minimum size: 536
Compacted row maximum size: 11864
Compacted row mean size: 872

Column Family: CF_users
SSTable count: 2
Space used (live): 148762893
Space used (total): 148762893
Number of Keys (estimate): 333312
Memtable Columns Count: 129725
Memtable Data Size: 17144125
Memtable Switch Count: 4
Read Count: 7440
Read Latency: 1.329 ms.
Write Count: 127465
Write Latency: 0.093 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 694112
Compacted row minimum size: 61
Compacted row maximum size: 3311
Compacted row mean size: 1298

The messages are in a keyspace with a replication factor of 2 and the
users are in a keyspace with a replication factor of 3.

Ah, and we use the RandomPartitioner.

Thanks again


2013/1/30 Guillermo Barbero <guillermo.barbero@spotbros.com>
>
> Guys,
>
> First thanks a lot for your answers.. now I have a little more info, lets see if this
helps:
>
> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon EC2
XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the performace
of the cluster falls.. PHP commands that usually lasted a few milliseconds last up to 15 seconds
for just 2/3 minutes every now and then (no specific pattern) (see commands timing in attached
picture).
>
> Today we have found this:
>
> Prior to a fall in performance on all Cassandra nodes log I can see this:
>
> Node 150:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress /10.0.0.152
is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress /10.0.0.152
is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659) Enqueuing
flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1264710747(84317/476272
serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db (84559
bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991)
>  INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line 109)
Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>
> Node 151:
>
>  INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress /10.0.0.152
is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress /10.0.0.152
is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (line 659) Enqueuing
flush of Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1453632632(93461/191418
serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305) Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db (93711
bytes) for commitlog position ReplayPosition(segmentId=1354832000013, position=20033362)
>
> Node 153:
>
>  INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) InetAddress /10.0.0.152
is now dead.
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line 122) GC for ParNew:
294 ms for 1 collections, 1539602448 used; max is 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line 122) GC for ParNew:
202 ms for 1 collections, 1829975768 used; max is 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line 122) GC for ParNew:
234 ms for 1 collections, 2037767104 used; max is 3886022656
>  INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) CFS(Keyspace='system',
ColumnFamily='HintsColumnFamily') liveRatio is 1.4195889775077013 (just-counted was 1.4195889775077013).
 calculation took 109ms for 64 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) CFS(Keyspace='system',
ColumnFamily='HintsColumnFamily') liveRatio is 1.5276925191259334 (just-counted was 1.5276925191259334).
 calculation took 11ms for 129 columns
>  INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) InetAddress /10.0.0.152
is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) CFS(Keyspace='system',
ColumnFamily='HintsColumnFamily') liveRatio is 1.561312033901826 (just-counted was 1.561312033901826).
 calculation took 234ms for 144 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (line 659) Enqueuing
flush of Memtable-HintsColumnFamily@1120216043(96278/187900 serialized/live bytes, 288 ops)
>
> Node 154:
>
> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java (line 392) Finished
hinted handoff of 0 rows to endpoint /10.0.0.153
>  INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) InetAddress /10.0.0.152
is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) InetAddress /10.0.0.152
is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (line 659) Enqueuing
flush of Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>  INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@908822645(92404/185145
serialized/live bytes, 286 ops)
>
>
> Node 155:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) InetAddress /10.0.0.152
is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) InetAddress /10.0.0.152
is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (line 659) Enqueuing
flush of Memtable-HintsColumnFamily@1083199446(94578/190654 serialized/live bytes, 316 ops)
>
> Meanwhile on node 152:
>
> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
(639626 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=11032499)
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line 122) GC for ParNew:
35087 ms for 3 collections, 442940272 used; max is 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line 57) Pool Name
                   Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line 72) ReadStage
                       32       343         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) RequestResponseStage
             0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) ReadRepairStage
                  0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) MutationStage
                    0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) ReplicateOnWriteStage
            0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) GossipStage
                      0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) AntiEntropyStage
                 0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) MigrationStage
                   0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) StreamStage
                      0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MemtablePostFlusher
              0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) FlushWriter
                      0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MiscStage
                        0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) commitlog_archiver
               0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) InternalResponseStage
            0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) HintedHandoff
                    0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 77) CompactionManager
                0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 89) MessagingService
               n/a      0,24
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 99) Cache Type
                    Size                 Capacity               KeysToSave               
                                         Provider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 100) KeyCache
                    133569                  2184533                      all
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 106) RowCache
                         0                        0                      all             
org.apache.cassandra.cache.SerializingCacheProvider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 113) ColumnFamily
               Memtable ops,data
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 116) system.NodeIdInfo
                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.IndexInfo
                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.LocationInfo
                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.Versions
                        3,103
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.schema_keyspaces
                  0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line 116) system.Migrations
                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line 116) system.schema_columnfamilies
                0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.schema_columns
                    0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.HintsColumnFamily
              1,2369
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.Schema
                            0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_mailPipes
           9241,30380203
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_profilePics
           427,1173642
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBVars
                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBCounters
                  5,232
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line 116) SB_NNCD.CF_userPhoneAgenda
       7919,2332898
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_BSLUsers
                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_mailTimeline
                  0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_SBMessages
         54390,15054201
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_NewsFeedTimeline
       1866,7681861
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_BSLSpots
                    1,616
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBUniqueIds
                   0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBChecks
                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_NFTimelines
            423,533156
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBAppUsersProfilePic
                0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBLikes
                    16,565
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBCodes
                 154,93735
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_attachmentsPerUser
          80,107551
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_comments
                 58,77581
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_attachments
            641,247796
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_SBStats
                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_userReversedPhoneAgenda
       9047,7132129
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_ACD.CF_GSMembers
              823,1206779
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line 116) SB_ACD.CF_NSLoad
                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line 116) SB_ACD.CF_sessions
                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line 116) SB_ACD.CF_userGS
                  193,577229
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_userApps
                     8,170
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_GSData
                    126,3503
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line 116) SB_ACD.CF_R2Notifications
         1906,58481
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line 116) SB_ACD.CF_uniqueKeys
                     0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line 116) SB_ACD.CF_userContactWannabes
         953,438671
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line 116) SB_ACD.CF_SBApps
                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line 116) SB_ACD.CF_SBAppUsers
                     0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_R1Notifications
       1354,2070513
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_SBVotes
                 463,230386
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line 116) SB_ACD.CF_userContacts
          2968,4849519
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_sessionsSuspended
                0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_userBlockedList
              5,207
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line 116) SB_ACD.CF_users
               65386,10852776
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145) Heap is 0.7514124302624755
full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to
the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
>  WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line 2880) Flushing
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to relieve memory pressure
>  INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java (line 659) Enqueuing
flush of Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) Writing Memtable-CF_mailPipes@192792072(6868296/30790355
serialized/live bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) Completed flushing
/raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db (2049401 bytes)
for commitlog position ReplayPosition(segmentId=1359573115280, position=13578602)
>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) setting live ratio
to maximum of 64.0 instead of 751.6512549537648
>  INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) CFS(Keyspace='SB_ACD',
ColumnFamily='CF_R2Notifications') liveRatio is 64.0 (just-counted was 64.0).  calculation
took 61ms for 8 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) CFS(Keyspace='SB_ACD',
ColumnFamily='CF_userGS') liveRatio is 5.309650177091864 (just-counted was 5.309650177091864).
 calculation took 11ms for 170 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 85070591730234615865843651857942052864 with IP: /10.0.0.153
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (line 659) Enqueuing
flush of Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1676047624(950/2369
serialized/live bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db (1016
bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=17323736)
>  INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java (line 109)
Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>
> The cluster is perfectly balanced:
>
> Note: Ownership information does not include topology, please specify a keyspace.
> Address         DC          Rack        Status State   Load            Owns         
      Token
>                                                                                     
      141784319550391032739561396922763706368
> 10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB        16.67%       
      0
> 10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB        16.67%       
      28356863910078203714492389662765613056
> 10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB         16.67%       
      56713727820156407428984779325531226112
> 10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB          16.67%       
      85070591730234615865843651857942052864
> 10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB         16.67%       
      113427455640312814857969558651062452224
> 10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB         16.67%       
      141784319550391032739561396922763706368
>
>
> I guess something happens on 152 that makes other nodes believe it is dead, then does
something to fix itself and comes back after (messing up performance in the meanwhile) .Does
this makes any sense? Any clue of what might be going on? I can post more logs if you guys
need
>
> Thanks again!!
>
> Guillermo
>
>
>
> 2013/1/30 Nate McCall <zznate.m@gmail.com>
>>
>> What's the output of nodetool cfstats for those 2 column families on
>> cassNode2 and cassNode3? And what is the replication factor for this
>> cluster?
>>
>> Per the previous reply, nodetool ring should show each of your nodes
>> with ~16.7% of the data if well balanced.
>>
>> Also, the auto-detection for memory sizes in the startup script is a
>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>> 8g/800m will give you some more breathing room.
>>
>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <btalbot@aeriagames.com> wrote:
>> > My guess is that those one or two nodes with the gc pressure also have more
>> > rows in your big CF.  More rows could be due to imbalanced distribution if
>> > your'e not using a random partitioner or from those nodes not yet removing
>> > deleted rows which other nodes may have done.
>> >
>> > JVM heap space is used for a few things which scale with key count
>> > including:
>> > - bloom filter (for C* < 1.2)
>> > - index samples
>> >
>> > Other space is used but can be more easily controlled by tuning for
>> > - memtable
>> > - compaction
>> > - key cache
>> > - row cache
>> >
>> >
>> > So, if those nodes have more rows (check using "nodetool ring" or "nodetool
>> > cfstats") than the others you can try to:
>> > - reduce the number of rows by adding nodes, run manual / tune compactions
>> > to remove rows with expired tombstones, etc.
>> > - increase bloom filter fp chance
>> > - increase jvm heap size (don't go too big)
>> > - disable key or row cache
>> > - increase index sample interval
>> >
>> > Not all of those things are generally good especially to the extreme so
>> > don't go setting a 20 GB jvm heap without understanding the consequences for
>> > example.
>> >
>> > -Bryan
>> >
>> >
>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>> > <guillermo.barbero@spotbros.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> >> warning messages are due to Heap is % full. According to this link
>> >>
>> >> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>> >> there are two ways to "reduce pressure":
>> >> 1. Decrease the cache sizes
>> >> 2. Increase the index interval size
>> >>
>> >> Most of the flushes are in two column families (users and messages), I
>> >> guess that's because the most mutations are there.
>> >>
>> >> I still have not applied those changes to the production environment.
>> >> Do you recommend any other meassure? Should I set specific tunning for
>> >> these two CFs? Should I check another metric?
>> >>
>> >> Additionally, the distribution of warning messages is not uniform
>> >> along the cluster. Why could cassandra be doing this? What should I do
>> >> to find out how to fix this?
>> >>
>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> >> the java version is the following:
>> >> java version "1.6.0_37"
>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>> >>
>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>> >> node, class that reports the message, first word of the message)
>> >> 2013-01-26
>> >>       5 cassNode0: GCInspector.java Heap
>> >>       5 cassNode0: StorageService.java Flushing
>> >>     232 cassNode2: GCInspector.java Heap
>> >>     232 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       3 cassNode4: GCInspector.java Heap
>> >>       3 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-27
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     189 cassNode2: GCInspector.java Heap
>> >>     189 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       1 cassNode4: GCInspector.java Heap
>> >>       1 cassNode4: StorageService.java Flushing
>> >>       1 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-28
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       1 cassNode1: GCInspector.java Heap
>> >>       1 cassNode1: StorageService.java Flushing
>> >>       1 cassNode2: AutoSavingCache.java Reducing
>> >>     343 cassNode2: GCInspector.java Heap
>> >>     342 cassNode2: StorageService.java Flushing
>> >>     181 cassNode3: GCInspector.java Heap
>> >>     181 cassNode3: StorageService.java Flushing
>> >>       4 cassNode4: GCInspector.java Heap
>> >>       4 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-29
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     156 cassNode2: GCInspector.java Heap
>> >>     156 cassNode2: StorageService.java Flushing
>> >>      71 cassNode3: GCInspector.java Heap
>> >>      71 cassNode3: StorageService.java Flushing
>> >>       2 cassNode4: GCInspector.java Heap
>> >>       2 cassNode4: StorageService.java Flushing
>> >>       2 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: Memtable.java setting
>> >>       2 cassNode5: StorageService.java Flushing
>> >>
>> >> --
>> >>
>> >> Guillermo Barbero - Backend Team
>> >>
>> >> Spotbros Technologies
>> >
>> >
>> >
>
>
>
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
>
>
>



--

Guillermo Barbero - Backend Team

Spotbros Technologies

Mime
View raw message