cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Kоvshov <nkovs...@yandex.ru>
Subject Re: Cassandra 1.0 hangs during GC
Date Mon, 30 Jul 2012 10:18:13 GMT
There is one peculiar thing about this problem

When I copy the cluster to other 2 servers (much weaker servers in means of CPU and RAM) and
run cassandra there, I see no massive CPU load and no huge GC times. Thus I suppose it has
something to do with hardware or software installed on the machines. I plan to check software
and hardware issues step-by-step to find that difference between servers which brings big
problems to one of (otherwise identical) clusters. 

What I plan to compare between 'bad' cluster and 'good' cluster:

- Configs, schemas, data etc: same
- java version : same
- RAM and CPU : 'bad' cluster has more
- Ubuntu version
- Networking
- What else??? 







26.07.2012, 10:25, "Nikolay Kоvshov" <nkovshov@yandex.ru>:
> Sun Java 6 didn't help it at all
>
> Sar shows no special activity on the long GC times
>
> And I have remembered that I do a lot of get_range_slices requests, each time on a range
of 100, if this could be important.
>
> Still my GC times are growing incredible
>
>  INFO [ScheduledTasks:1] 2012-07-25 12:17:04,029 GCInspector.java (line 122) GC for
ParNew: 497253 ms for 1 collections, 114036128 used; max is 8464105472
>  INFO [ScheduledTasks:1] 2012-07-25 13:26:21,282 GCInspector.java (line 122) GC for
ParNew: 149508 ms for 1 collections, 87015440 used; max is 8464105472
>  INFO [ScheduledTasks:1] 2012-07-25 18:39:44,654 GCInspector.java (line 122) GC for
ParNew: 83584 ms for 1 collections, 54368032 used; max is 8464105472
>  INFO [ScheduledTasks:1] 2012-07-25 22:45:42,448 GCInspector.java (line 122) GC for
ParNew: 48454 ms for 1 collections, 44072488 used; max is 8464105472
>
> 24.07.2012, 13:07, "Joost van de Wijgerd" <jwijgerd@gmail.com>:
>
>>  You are better off using Sun Java 6 to run Cassandra. In the past
>>  there were issues reported on 7. Can you try running it on Sun Java 6?
>>
>>  kind regards
>>
>>  Joost
>>
>>  On Tue, Jul 24, 2012 at 10:04 AM, Nikolay Kоvshov <nkovshov@yandex.ru> wrote:
>>>   48 G of Ram on that machine, swap is not used. I will disable swap at all
just in case
>>>   I have 4 cassandra processes (parts of 4 different clusters), each allocated
8 GB and using 4 of them
>>>>  java -version
>>>   java version "1.7.0"
>>>   Java(TM) SE Runtime Environment (build 1.7.0-b147)
>>>   Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
>>>
>>>   23.07.2012, 20:12, "Joost van de Wijgerd" <jwijgerd@gmail.com>:
>>>>   Howmuch memory do you have on the machine. Seems like you have 8G
>>>>   reserved for the Cassandra java process, If this is all the memory on
>>>>   the machine you might be swapping. Also which jvm do you use?
>>>>
>>>>   kind regards
>>>>
>>>>   Joost
>>>>
>>>>   On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov <nkovshov@yandex.ru>
wrote:
>>>>>     21th I have mirgated to cassandra 1.1.2 but see no improvement
>>>>>
>>>>>    cat /var/log/cassandra/Earth1.log | grep "GC for"
>>>>>    INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java
(line 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 8464105472
>>>>>    INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java
(line 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 8464105472
>>>>>    INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java
(line 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 8464105472
>>>>>    INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java
(line 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 8464105472
>>>>>    INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java
(line 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 8464105472
>>>>>    === Migrated from 1.0.0 to 1.1.2
>>>>>    INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java
(line 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 8464105472
>>>>>    INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java
(line 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 8464105472
>>>>>    INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java
(line 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 8464105472
>>>>>    INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java
(line 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 8464105472
>>>>>
>>>>>    Memory and yaml memory-related settings are default
>>>>>    I do not do deletes
>>>>>    I have 2 CF's and no secondary indexes
>>>>>
>>>>>    LiveRatio's:
>>>>>     INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java
(line 177) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was
1.0).  calculation took 85ms for 6236 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was
1.0).  calculation took 8ms for 1 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0).
 calculation took 99ms for 1094 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was
1.0).  calculation took 80ms for 242 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0).
 calculation took 505ms for 2678 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0).
 calculation took 776ms for 5236 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was
1.0).  calculation took 64ms for 389 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:55,162 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0).
 calculation took 1378ms for 8948 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:04:55,304 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was
1.0).  calculation took 140ms for 1082 columns
>>>>>     INFO [MemoryMeter:1] 2012-07-21 09:05:08,439 Memtable.java (line
213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 2.5763038186160894 (just-counted
was 2.5763038186160894).  calculation took 8796ms for 102193 columns
>>>>>
>>>>>    18.07.2012, 07:51, "aaron morton" <aaron@thelastpickle.com>:
>>>>>>    Assuming all the memory and yaml settings default that does
not sound right. The first thought would be the memory meter not counting correctly...
>>>>>>    Do you do a lot of deletes ?
>>>>>>    Do you have a lot of CF's and/or secondary indexes ?
>>>>>>    Can you see log lines about the "liveRatio" for your cf's ?
>>>>>>    I would upgrade to 1.0.10 before getting too carried away though.
>>>>>>    Cheers
>>>>>>
>>>>>>    -----------------
>>>>>>    Aaron Morton
>>>>>>    Freelance Developer
>>>>>>    @aaronmorton
>>>>>>    http://www.thelastpickle.com
>>>>>>
>>>>>>    On 17/07/2012, at 8:14 PM, Nikolay Kоvshov wrote:
>>>>>>>    This is a cluster of 2 nodes, each having 8G of operating
memory, replicationfactor=2
>>>>>>>    Write/read pressure is quite low and almost never exceeds
10/second
>>>>>>>
>>>>>>>    From time to time (2-3 times in a month) I see GC activity
in logs and for this time cassandra stops responding to requests which results in a timeout
in upper-layer application. Total time of unavailability can be over 5 minues (like in the
following case)
>>>>>>>
>>>>>>>    What can I do with that? Wiil it become much worse when
my cluster grows up?
>>>>>>>
>>>>>>>    INFO [GossipTasks:1] 2012-07-16 13:10:50,055 Gossiper.java
(line 736) InetAddress /10.220.50.9 is now dead.
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,056 GCInspector.java
(line 123) GC for ParNew: 391383 ms for 1 collections, 2025808488 used; max is 8464105472
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,067 StatusLogger.java
(line 50) Pool Name                    Active   Pending   Blocked
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java
(line 65) ReadStage                         0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java
(line 65) RequestResponseStage              0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java
(line 65) ReadRepairStage                   0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java
(line 65) MutationStage                     0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,073 StatusLogger.java
(line 65) ReplicateOnWriteStage             0         0         0
>>>>>>>     INFO [GossipStage:1] 2012-07-16 13:10:50,074 Gossiper.java
(line 722) InetAddress /10.220.50.9 is now UP
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java
(line 65) GossipStage                       1        59         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java
(line 65) AntiEntropyStage                  0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java
(line 65) MigrationStage                    0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java
(line 65) StreamStage                       0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java
(line 65) MemtablePostFlusher               0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java
(line 65) FlushWriter                       0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java
(line 65) MiscStage                         0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java
(line 65) InternalResponseStage             0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java
(line 65) HintedHandoff                     0         0         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java
(line 69) CompactionManager               n/a         0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java
(line 81) MessagingService                n/a       0,0
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java
(line 85) ColumnFamily                Memtable ops,data  Row cache size/cap
 Key cache size/cap
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java
(line 88) Keyspace1.PSS                 36712,343842617                 0/0
      97995/1000000
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java
(line 88) Keyspace1.Standard1          128679,162567721                 0/0
          0/1000000
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java
(line 88) system.NodeIdInfo                         0,0                 0/0
                0/1
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java
(line 88) system.IndexInfo                          0,0                 0/0
                0/1
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java
(line 88) system.LocationInfo                       0,0                 0/0
                1/2
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java
(line 88) system.Migrations                         0,0                 0/0
                3/3
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,087 StatusLogger.java
(line 88) system.HintsColumnFamily                1,528                 0/0
                0/1
>>>>>>>     INFO [ScheduledTasks:1] 2012-07-16 13:10:50,087 StatusLogger.java
(line 88) system.Schema                             0,0                 0/0
                2/3
>>>>>>>    -------- Завершение пересылаемого
сообщения --------
>>>>   --
>>>>   Joost van de Wijgerd
>>>>   Visseringstraat 21B
>>>>   1051KH Amsterdam
>>>>   +31624111401
>>>>   joost.van.de.wijgerd@Skype
>>>>   http://www.linkedin.com/in/jwijgerd
>>  --
>>  Joost van de Wijgerd
>>  Visseringstraat 21B
>>  1051KH Amsterdam
>>  +31624111401
>>  joost.van.de.wijgerd@Skype
>>  http://www.linkedin.com/in/jwijgerd

Mime
View raw message