cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robbie Strickland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause
Date Wed, 07 Oct 2015 10:13:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946624#comment-14946624
] 

Robbie Strickland commented on CASSANDRA-10449:
-----------------------------------------------

I increased max heap to 96GB and tried again.  Now doing netstats shows progress ground to
a halt:

9pm:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
    /52.1.155.147 (using /10.239.209.15)
        Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412
bytes total
    /52.2.9.34 (using /10.239.209.17)
        Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853
bytes total
    /52.0.152.88 (using /10.239.209.44)
        Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
    /52.2.0.164 (using /10.239.209.16)
        Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768
bytes total
    /54.152.177.161 (using /10.239.209.93)
    /54.172.174.48 (using /10.239.209.49)
        Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
    /52.2.75.82 (using /10.239.208.88)
    /54.165.111.69 (using /10.239.209.47)
        Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69
    /52.6.136.30 (using /10.239.209.45)
        Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30
    /52.7.14.201 (using /10.239.209.46)
        Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573
bytes total
    /52.2.30.66 (using /10.239.209.18)
        Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151
bytes total
    /54.175.138.33 (using /10.239.209.96)
    /54.88.44.178 (using /10.239.209.91)
    /52.2.109.194 (using /10.239.208.89)
    /54.172.81.117 (using /10.239.209.95)
    /54.172.103.46 (using /10.239.209.48)
        Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182
bytes total
    /54.164.172.164 (using /10.239.209.94)
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a        19             56
Responses                       n/a         0       35515795
{noformat}

6am:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
    /52.1.155.147 (using /10.239.209.15)
        Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412
bytes total
    /52.2.9.34 (using /10.239.209.17)
        Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853
bytes total
    /52.0.152.88 (using /10.239.209.44)
        Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
    /52.2.0.164 (using /10.239.209.16)
        Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768
bytes total
    /54.152.177.161 (using /10.239.209.93)
    /54.172.174.48 (using /10.239.209.49)
        Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
    /52.2.75.82 (using /10.239.208.88)
    /54.165.111.69 (using /10.239.209.47)
        Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69
    /52.6.136.30 (using /10.239.209.45)
        Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30
    /52.7.14.201 (using /10.239.209.46)
        Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573
bytes total
    /52.2.30.66 (using /10.239.209.18)
        Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151
bytes total
    /54.175.138.33 (using /10.239.209.96)
    /54.88.44.178 (using /10.239.209.91)
    /52.2.109.194 (using /10.239.208.89)
    /54.172.81.117 (using /10.239.209.95)
    /54.172.103.46 (using /10.239.209.48)
        Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182
bytes total
    /54.164.172.164 (using /10.239.209.94)
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a        19             56
Responses                       n/a         0       51933813
{noformat}

No additional long GC pauses.

> OOM on bootstrap due to long GC pause
> -------------------------------------
>
>                 Key: CASSANDRA-10449
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu 14.04, AWS
>            Reporter: Robbie Strickland
>              Labels: gc
>             Fix For: 2.1.x
>
>         Attachments: system.log.10-05
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 500-700GB per
node.  SSTable counts are <10 per table.  I am attempting to provision additional nodes,
but bootstrapping OOMs every time after about 10 hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old Generation
GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 CassandraDaemon.java:223 - Exception
in thread Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message