cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jürgen Albersdorfer (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-14239) OutOfMemoryError when bootstrapping with less than 100GB RAM
Date Wed, 11 Apr 2018 09:46:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433637#comment-16433637
] 

Jürgen Albersdorfer edited comment on CASSANDRA-14239 at 4/11/18 9:45 AM:
--------------------------------------------------------------------------

I changed
{code:java}
disk_optimization_strategy: ssd
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
{code}
Streaming was much more faster and produced less CPU pressure than before 
{code:java}
-dsk/total- ---system-- ----total-cpu-usage---- --io/total- -net/total-
 read  writ| int   csw |usr sys idl wai hiq siq| read  writ| recv  send
9830B   31M|  48k 7751 | 67   2  31   0   0   1|0.20  85.8 |  30M  380k
   0    28M|  51k 7838 | 65   2  32   0   0   1|   0  80.9 |  33M  511k
  32k   35M|  54k 9024 | 66   2  31   0   0   1|0.60   102 |  37M  540k
   0    28M|  41k 7072 | 62   2  36   0   0   1|   0  78.1 |  26M  265k
1638B   25M|  41k 6606 | 62   1  36   0   0   0|0.10  67.6 |  25M  110k
1638B   26M|  41k 7251 | 57   1  41   0   0   0|0.10  69.9 |  27M  138k
 819B   24M|  40k 6129 | 56   1  42   0   0   1|0.20  61.5 |  25M  127k
   0    25M|  38k 7273 | 56   1  42   0   0   0|   0  66.9 |  26M  162k
1024k   24M|  35k 6501 | 56   1  42   0   0   0|25.2  62.8 |  25M  128k
   0    24M|  37k 7238 | 56   1  42   0   0   0|   0  62.6 |  26M  164k
   0    24M|  35k 6349 | 56   1  42   0   0   0|   0  63.5 |  25M  145k
 410B   26M|  40k 6979 | 56   2  42   0   0   0|0.10  73.1 |  28M  341k
   0    28M|  41k 7042 | 56   1  42   0   0   0|   0  70.8 |  30M  350k
2048B   31M|  44k 7334 | 56   2  42   0   0   0|0.20  85.4 |  32M  347k
   0    31M|  46k 6515 | 56   1  42   0   0   1|   0  86.0 |  33M  383k
   0    30M|  47k 7572 | 56   1  42   0   0   1|   0  82.3 |  33M  466k
7373B   31M|  41k 5742 | 56   1  42   0   0   0|0.20  84.3 |  30M  319k
   0    30M|  43k 7146 | 56   2  42   0   0   1|   0  87.4 |  28M  423k
{code}
when `Received complete` for all Nodes, bootstrap didn't finish and I can observe a

 
 * stalled number of `Completed` MutationStage,
 * while the `Pending` MutationStage seems to skyrocket.
 * Rest of it looks fine to me  :(

 
{code:java}
nodetool tpstats
Pool Name                         Active   Pending      Completed  
Blocked  All time blocked
ReadStage                              0         0             
0         0                 0
MiscStage                              0         0             
0         0                 0
CompactionExecutor                     2         7            
53         0                 0
MutationStage                        128   5722021      593964000        
0                 0
MemtableReclaimMemory                  0         0          
2194         0                 0
PendingRangeCalculator                 0         0            
19         0                 0
GossipStage                            0         0         
25736         0                 0
SecondaryIndexManagement               0         0             
0         0                 0
HintsDispatcher                        0         0             
0         0                 0
RequestResponseStage                   0         0        
167108         0                 0
ReadRepairStage                        0         0             
0         0                 0
CounterMutationStage                   0         0             
0         0                 0
MigrationStage                         0         0            
40         0                 0
MemtablePostFlush                      1        11          
2344         0                 0
PerDiskMemtableFlushWriter_0           0         0           2194        
0                 0
ValidationExecutor                     0         0             
0         0                 0
Sampler                                0         0             
0         0                 0
MemtableFlushWriter                    2        11          
2194         0                 0
InternalResponseStage                  0         0            
31         0                 0
ViewMutationStage                      0         0             
0         0                 0
AntiEntropyStage                       0         0             
0         0                 0
CacheCleanupExecutor                   0         0             
0         0                 0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

{code}
   

*Why does `MutationStage`now `(busy) hang`? - While*
 * SlabPoolCleaner Thread uses a single logical CPU at 100% permanently
 * G1 Old Gen increases linearly over time and goes far beyond 50GB
 * See attached [^gc.log.201804111141.zip] at [gceasy.io|http://gceasy.io/diamondgc-report.jsp?oTxnId_value=5c97d52f-1d06-4d28-8ab7-dd9bd58311b7]

 


was (Author: jalbersdorfer):
 

I changed
disk_optimization_strategy: ssd
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
Streaming was much more faster and produced less CPU pressure than before

 
{code:java}
-dsk/total- ---system-- ----total-cpu-usage---- --io/total- -net/total-
 read  writ| int   csw |usr sys idl wai hiq siq| read  writ| recv  send
9830B   31M|  48k 7751 | 67   2  31   0   0   1|0.20  85.8 |  30M  380k
   0    28M|  51k 7838 | 65   2  32   0   0   1|   0  80.9 |  33M  511k
  32k   35M|  54k 9024 | 66   2  31   0   0   1|0.60   102 |  37M  540k
   0    28M|  41k 7072 | 62   2  36   0   0   1|   0  78.1 |  26M  265k
1638B   25M|  41k 6606 | 62   1  36   0   0   0|0.10  67.6 |  25M  110k
1638B   26M|  41k 7251 | 57   1  41   0   0   0|0.10  69.9 |  27M  138k
 819B   24M|  40k 6129 | 56   1  42   0   0   1|0.20  61.5 |  25M  127k
   0    25M|  38k 7273 | 56   1  42   0   0   0|   0  66.9 |  26M  162k
1024k   24M|  35k 6501 | 56   1  42   0   0   0|25.2  62.8 |  25M  128k
   0    24M|  37k 7238 | 56   1  42   0   0   0|   0  62.6 |  26M  164k
   0    24M|  35k 6349 | 56   1  42   0   0   0|   0  63.5 |  25M  145k
 410B   26M|  40k 6979 | 56   2  42   0   0   0|0.10  73.1 |  28M  341k
   0    28M|  41k 7042 | 56   1  42   0   0   0|   0  70.8 |  30M  350k
2048B   31M|  44k 7334 | 56   2  42   0   0   0|0.20  85.4 |  32M  347k
   0    31M|  46k 6515 | 56   1  42   0   0   1|   0  86.0 |  33M  383k
   0    30M|  47k 7572 | 56   1  42   0   0   1|   0  82.3 |  33M  466k
7373B   31M|  41k 5742 | 56   1  42   0   0   0|0.20  84.3 |  30M  319k
   0    30M|  43k 7146 | 56   2  42   0   0   1|   0  87.4 |  28M  423k
{code}
when `Received complete` for all Nodes, bootstrap didn't finish and I can observe a

 
 * stalled number of `Completed` MutationStage,
 * while the `Pending` MutationStage seems to skyrocket.
 * Rest of it looks fine to me  :(

 
{code:java}
nodetool tpstats
Pool Name                         Active   Pending      Completed  
Blocked  All time blocked
ReadStage                              0         0             
0         0                 0
MiscStage                              0         0             
0         0                 0
CompactionExecutor                     2         7            
53         0                 0
MutationStage                        128   5722021      593964000        
0                 0
MemtableReclaimMemory                  0         0          
2194         0                 0
PendingRangeCalculator                 0         0            
19         0                 0
GossipStage                            0         0         
25736         0                 0
SecondaryIndexManagement               0         0             
0         0                 0
HintsDispatcher                        0         0             
0         0                 0
RequestResponseStage                   0         0        
167108         0                 0
ReadRepairStage                        0         0             
0         0                 0
CounterMutationStage                   0         0             
0         0                 0
MigrationStage                         0         0            
40         0                 0
MemtablePostFlush                      1        11          
2344         0                 0
PerDiskMemtableFlushWriter_0           0         0           2194        
0                 0
ValidationExecutor                     0         0             
0         0                 0
Sampler                                0         0             
0         0                 0
MemtableFlushWriter                    2        11          
2194         0                 0
InternalResponseStage                  0         0            
31         0                 0
ViewMutationStage                      0         0             
0         0                 0
AntiEntropyStage                       0         0             
0         0                 0
CacheCleanupExecutor                   0         0             
0         0                 0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

{code}
 

 

 

*Why does `MutationStage`now `(busy) hang`? - While* 
 * SlabPoolCleaner Thread uses a single logical CPU at 100% permanently
 * G1 Old Gen increases linearly over time and goes far beyond 50GB
 * See attached [^gc.log.201804111141.zip] at [gceasy.io|http://gceasy.io/diamondgc-report.jsp?oTxnId_value=5c97d52f-1d06-4d28-8ab7-dd9bd58311b7]

 

> OutOfMemoryError when bootstrapping with less than 100GB RAM
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-14239
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14239
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Details of the bootstrapping Node
>  * ProLiant BL460c G7
>  * 56GB RAM
>  * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and saved_caches)
>  * CentOS 7.4 on SD-Card
>  * /tmp and /var/log on tmpfs
>  * Oracle JDK 1.8.0_151
>  * Cassandra 3.11.1
> Cluster
>  * 10 existing Nodes (Up and Normal)
>            Reporter: Jürgen Albersdorfer
>            Priority: Major
>         Attachments: Objects-by-class.csv, Objects-with-biggest-retained-size.csv, cassandra-env.sh,
cassandra.yaml, gc.log.0.current.zip, gc.log.201804111141.zip, jvm.options, jvm_opts.txt,
stack-traces.txt
>
>
> Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on our 10 Node
C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM Heap Old Gen
which gets not significantly freed up any more.
> I know that JVM collects on Old Gen only when really needed. I can see collections, but
there is always a remainder which seems to grow forever without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra RAM I have given
it for bootstrapping without any further effect.
> It feels like Cassandra will not forget about every single byte streamed over the Network
over time during bootstrapping, - which would be a memory leak and a major problem, too.
> I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB assigned JVM
Heap). YourKit Profiler shows huge amount of Memory allocated for org.apache.cassandra.db.Memtable
(22 GB) org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer (11 GB)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message