cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robbie Strickland (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap after long GC pause
Date Thu, 22 Oct 2015 13:10:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946624#comment-14946624
] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/22/15 1:10 PM:
-------------------------------------------------------------------------

I increased max heap to 96GB and tried again.  Now doing netstats shows progress ground to
a halt:

9pm:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
        Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412
bytes total
        Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853
bytes total
        Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
955162267/4105438496 bytes(23%) received from idx:0/x.x.x.x
        Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768
bytes total
        Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
174070078/7326235809 bytes(2%) received from idx:0/x.x.x.x
        Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
4875660361/22821083384 bytes(21%) received from idx:0/x.x.x.x
        Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
17064156850/25823860172 bytes(66%) received from idx:0/x.x.x.x
        Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573
bytes total
        Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151
bytes total
        Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182
bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a        19             56
Responses                       n/a         0       35515795
{noformat}

6am:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
        Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412
bytes total
        Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853
bytes total
        Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
955162267/4105438496 bytes(23%) received from idx:0/x.x.x.x
        Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768
bytes total
        Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
174070078/7326235809 bytes(2%) received from idx:0/x.x.x.x
        Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
4875660361/22821083384 bytes(21%) received from idx:0/x.x.x.x
        Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
17064156850/25823860172 bytes(66%) received from idx:0/x.x.x.x
        Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573
bytes total
        Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151
bytes total
        Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182
bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a        19             56
Responses                       n/a         0       51933813
{noformat}

No additional long GC pauses.


was (Author: rstrickland):
I increased max heap to 96GB and tried again.  Now doing netstats shows progress ground to
a halt:

9pm:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
    /52.1.155.147 (using /10.239.209.15)
        Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412
bytes total
    /52.2.9.34 (using /10.239.209.17)
        Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853
bytes total
    /52.0.152.88 (using /10.239.209.44)
        Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
    /52.2.0.164 (using /10.239.209.16)
        Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768
bytes total
    /54.152.177.161 (using /10.239.209.93)
    /54.172.174.48 (using /10.239.209.49)
        Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
    /52.2.75.82 (using /10.239.208.88)
    /54.165.111.69 (using /10.239.209.47)
        Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69
    /52.6.136.30 (using /10.239.209.45)
        Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30
    /52.7.14.201 (using /10.239.209.46)
        Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573
bytes total
    /52.2.30.66 (using /10.239.209.18)
        Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151
bytes total
    /54.175.138.33 (using /10.239.209.96)
    /54.88.44.178 (using /10.239.209.91)
    /52.2.109.194 (using /10.239.208.89)
    /54.172.81.117 (using /10.239.209.95)
    /54.172.103.46 (using /10.239.209.48)
        Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182
bytes total
    /54.164.172.164 (using /10.239.209.94)
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a        19             56
Responses                       n/a         0       35515795
{noformat}

6am:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
    /52.1.155.147 (using /10.239.209.15)
        Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412
bytes total
    /52.2.9.34 (using /10.239.209.17)
        Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853
bytes total
    /52.0.152.88 (using /10.239.209.44)
        Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
    /52.2.0.164 (using /10.239.209.16)
        Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768
bytes total
    /54.152.177.161 (using /10.239.209.93)
    /54.172.174.48 (using /10.239.209.49)
        Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
    /52.2.75.82 (using /10.239.208.88)
    /54.165.111.69 (using /10.239.209.47)
        Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700
bytes total
            /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69
    /52.6.136.30 (using /10.239.209.45)
        Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899
bytes total
            /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30
    /52.7.14.201 (using /10.239.209.46)
        Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573
bytes total
    /52.2.30.66 (using /10.239.209.18)
        Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151
bytes total
    /54.175.138.33 (using /10.239.209.96)
    /54.88.44.178 (using /10.239.209.91)
    /52.2.109.194 (using /10.239.208.89)
    /54.172.81.117 (using /10.239.209.95)
    /54.172.103.46 (using /10.239.209.48)
        Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182
bytes total
    /54.164.172.164 (using /10.239.209.94)
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a        19             56
Responses                       n/a         0       51933813
{noformat}

No additional long GC pauses.

> OOM on bootstrap after long GC pause
> ------------------------------------
>
>                 Key: CASSANDRA-10449
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu 14.04, AWS
>            Reporter: Robbie Strickland
>              Labels: gc
>             Fix For: 2.1.x
>
>         Attachments: GCpath.txt, heap_dump.png, system.log.10-05, thread_dump.log, threads.txt
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 500-700GB per
node.  SSTable counts are <10 per table.  I am attempting to provision additional nodes,
but bootstrapping OOMs every time after about 10 hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old Generation
GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 CassandraDaemon.java:223 - Exception
in thread Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message