cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J├╝rgen Albersdorfer <jalbersdor...@gmail.com>
Subject Bootstrapping fails with < 128GB RAM ...
Date Wed, 07 Feb 2018 11:38:55 GMT
Hi, I always face an issue when bootstrapping a Node having less than 184GB
RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster.
During bootstrap, when I watch the cassandra.log I observe a growth in JVM
Heap Old Gen which gets not significantly freed any more.
I know that JVM collects on Old Gen only when really needed. I can see
collections, but there is always a remainder which
seems to grow forever without ever getting freed.
After the Node successfully Joined the Cluster, I can remove the extra
128GB of RAM I have given it for bootstrapping without any further effect.

It feels like Cassandra will not forget about every single byte streamed
over the Network over time during bootstrapping, - which would be a memory
leak and a major problem, too.

Or is there something I'm doing wrong? - Any Ideas?

Here my observations on a failing Bootstrap - The following Node has 72GB
RAM installed, 64GB of it are configured for JVM Heap Space.

cassandra.log (truncated):
INFO  [Service Thread] 2018-02-07 11:12:49,604 GCInspector.java:284 - G1
Young Generation GC in 984ms.  G1 Eden Space: 14763950080 -> 0; G1 Old Gen:
36960206856 -> 39661338640; G1 Survivor Space: 2785017856 -> 1476395008;
INFO  [Service Thread] 2018-02-07 11:13:00,108 GCInspector.java:284 - G1
Young Generation GC in 784ms.  G1 Eden Space: 18387828736 -> 0; G1 Old Gen:
39661338640 -> 41053847560; G1 Survivor Space: 1476395008 -> 1845493760;
INFO  [Service Thread] 2018-02-07 11:13:08,639 GCInspector.java:284 - G1
Young Generation GC in 718ms.  G1 Eden Space: 16743661568 -> 0; G1 Old Gen:
41053847560 -> 42832232472; G1 Survivor Space: 1845493760 -> 1375731712;
INFO  [Service Thread] 2018-02-07 11:13:18,271 GCInspector.java:284 - G1
Young Generation GC in 546ms.  G1 Eden Space: 15535702016 -> 0; G1 Old Gen:
42831004832 -> 44206736544; G1 Survivor Space: 1375731712 -> 1006632960;
INFO  [Service Thread] 2018-02-07 11:13:35,364 GCInspector.java:284 - G1
Young Generation GC in 638ms.  G1 Eden Space: 14025752576 -> 0; G1 Old Gen:
44206737048 -> 45213369488; G1 Survivor Space: 1778384896 -> 1610612736;
INFO  [Service Thread] 2018-02-07 11:13:42,898 GCInspector.java:284 - G1
Young Generation GC in 614ms.  G1 Eden Space: 13388218368 -> 0; G1 Old Gen:
45213369488 -> 46152893584; G1 Survivor Space: 1610612736 -> 1006632960;
INFO  [Service Thread] 2018-02-07 11:13:58,291 GCInspector.java:284 - G1
Young Generation GC in 400ms.  G1 Eden Space: 13119782912 -> 0; G1 Old Gen:
46136116376 -> 47171400848; G1 Survivor Space: 1275068416 -> 771751936;
INFO  [Service Thread] 2018-02-07 11:14:23,071 GCInspector.java:284 - G1
Young Generation GC in 303ms.  G1 Eden Space: 11676942336 -> 0; G1 Old Gen:
47710958232 -> 48239699096; G1 Survivor Space: 1207959552 -> 973078528;
INFO  [Service Thread] 2018-02-07 11:14:46,157 GCInspector.java:284 - G1
Young Generation GC in 305ms.  G1 Eden Space: 11005853696 -> 0; G1 Old Gen:
48903342232 -> 49289001104; G1 Survivor Space: 939524096 -> 973078528;
INFO  [Service Thread] 2018-02-07 11:14:53,045 GCInspector.java:284 - G1
Young Generation GC in 380ms.  G1 Eden Space: 10569646080 -> 0; G1 Old Gen:
49289001104 -> 49586732696; G1 Survivor Space: 973078528 -> 1308622848;
INFO  [Service Thread] 2018-02-07 11:15:04,692 GCInspector.java:284 - G1
Young Generation GC in 360ms.  G1 Eden Space: 9294577664 -> 0; G1 Old Gen:
50671712912 -> 51269944472; G1 Survivor Space: 905969664 -> 805306368;
WARN  [Service Thread] 2018-02-07 11:15:07,317 GCInspector.java:282 - G1
Young Generation GC in 1102ms.  G1 Eden Space: 2617245696 -> 0; G1 Old Gen:
51269944472 -> 47310521496; G1 Survivor Space: 805306368 -> 301989888;
....
INFO  [Service Thread] 2018-02-07 11:16:36,535 GCInspector.java:284 - G1
Young Generation GC in 377ms.  G1 Eden Space: 7683964928 -> 0; G1 Old Gen:
51958433432 -> 52658554008; G1 Survivor Space: 1073741824 -> 1040187392;
INFO  [Service Thread] 2018-02-07 11:16:41,756 GCInspector.java:284 - G1
Young Generation GC in 340ms.  G1 Eden Space: 7046430720 -> 0; G1 Old Gen:
52624999576 -> 53299987616; G1 Survivor Space: 1040187392 -> 805306368;
WARN  [Service Thread] 2018-02-07 11:16:44,087 GCInspector.java:282 - G1
Young Generation GC in 1005ms.  G1 Eden Space: 2617245696 -> 0; G1 Old Gen:
53299987616 -> 49659331752; G1 Survivor Space: 805306368 -> 436207616;
...
INFO  [Service Thread] 2018-02-07 11:25:40,902 GCInspector.java:284 - G1
Young Generation GC in 254ms.  G1 Eden Space: 11475615744 -> 0; G1 Old Gen:
48904357040 -> 48904357544; G1 Survivor Space: 704643072 -> 805306368;
INFO  [Service Thread] 2018-02-07 11:26:11,424 GCInspector.java:284 - G1
Young Generation GC in 202ms.  G1 Eden Space: 11005853696 -> 0; G1 Old Gen:
48904357544 -> 49321014960; G1 Survivor Space: 939524096 -> 536870912;
WARN  [Service Thread] 2018-02-07 11:26:44,484 GCInspector.java:282 - G1
Young Generation GC in 1295ms.  G1 Eden Space: 2617245696 -> 0; G1 Old Gen:
49321014960 -> 46255753384; G1 Survivor Space: 805306368 -> 402653184;
...
INFO  [Service Thread] 2018-02-07 11:30:37,828 GCInspector.java:284 - G1
Young Generation GC in 958ms.  G1 Eden Space: 2785017856 -> 0; G1 Old Gen:
51196393000 -> 50629766184; G1 Survivor Space: 637534208 -> 436207616;
INFO  [Service Thread] 2018-02-07 11:30:45,036 GCInspector.java:284 - G1
Young Generation GC in 270ms.  G1 Eden Space: 10267656192 -> 0; G1 Old Gen:
50629766184 -> 50626254144; G1 Survivor Space: 436207616 -> 738197504;
INFO  [Service Thread] 2018-02-07 11:31:48,128 GCInspector.java:284 - G1
Young Generation GC in 984ms.  G1 Eden Space: 2617245696 -> 0; G1 Old Gen:
51086410272 -> 50443965480; G1 Survivor Space: 805306368 -> 369098752;


jvm.options as following (comments removed):
## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
-XX:MaxGCPauseMillis=1000
-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=16
-XX:ConcGCThreads=16

### GC logging options -- uncomment to enable
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
#-XX:PrintFLSStatistics=1
#-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M

I tried this with ParNewGC and ConcMarkSweepGC, too - and I have the same
behavior there, too.

>From nodetool netstats I see that it wants to Stream about 55,9 GB.
After 1,5h of streaming with more than 10MB/s (about 54GB seen with dstat)
nodetool netstats displays that only 3,5GB of 55,9 GB already received.

uptime
 11:30:52 up  1:36,  3 users,  load average: 106.01, 87.54, 66.01

nodetool netstats (truncated for better reading)
Wed Feb  7 11:19:07 CET 2018
Mode: JOINING
Bootstrap 56d204d0-0be9-11e8-ae30-617216855b4a
    /192.168.1.213 - Receiving 68 files, 6.774.831.556 bytes total. Already
received 3 files, 279.238.740 bytes total
    /192.168.1.215 - Receiving 68 files, 5.721.460.494 bytes total. Already
received 4 files, 109.051.913 bytes total
    /192.168.1.214 - Receiving 68 files, 7.497.726.056 bytes total. Already
received 4 files, 870.592.708 bytes total
    /192.168.1.207 - Receiving 63 files, 4.945.809.501 bytes total. Already
received 4 files, 700.599.427 bytes total
    /192.168.1.232 - Receiving 91 files, 7.344.537.682 bytes total. Already
received 3 files, 237.482.005 bytes total
    /192.168.1.209 - Receiving 102 files, 15.931.849.729 bytes total.
Already received 3 files, 1.108.754.920 bytes total
    /192.168.1.231 - Receiving 92 files, 7.927.882.516 bytes total. Already
received 4 files, 269.514.936 bytes total


nodetool status:
Datacenter: main
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID
             Rack
UN  192.168.1.232  83,31 GiB  256          ?
 510a0068-ee2b-4d1f-9965-9e29602d2f8f  rack04
UN  192.168.1.206  51,41 GiB  256          ?
 a168b632-52e7-408a-ae7f-6ba6b9c55cea  rack01
UN  192.168.1.207  57,66 GiB  256          ?
 7401ab8f-114d-41b4-801d-53a4b042de52  rack01
UN  192.168.1.208  56,47 GiB  256          ?
 767980ef-52f2-4c21-8567-324fc1db274c  rack01...
UJ  192.168.1.160  68,95 GiB  256          ?
 a3a5a169-512f-4e1f-8c0b-419c828f23e1  rack02
UN  192.168.1.209  94,27 GiB  256          ?
 8757cb4a-183e-4828-8212-7715b5563935  rack02
UN  192.168.1.213  78,26 GiB  256          ?
 b1e9481c-4ba2-4396-837a-84be35737fe7  rack05
UN  192.168.1.214  80,66 GiB  256          ?
 457fc606-7002-49ad-8da5-309b92093acf  rack06
UN  192.168.1.231  87,5 GiB   256          ?
 2017a9e8-3638-465e-bc4a-5e59e693fb49  rack03
UN  192.168.1.215  86,97 GiB  256          ?
 5dfe4c35-8f8a-4305-824a-4610cec9411b  rack07

thanks, and kind regards
Juergen

Mime
View raw message