cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DOAN DuyHai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
Date Sun, 20 Mar 2016 08:57:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203172#comment-15203172
] 

DOAN DuyHai commented on CASSANDRA-11383:
-----------------------------------------

[~xedin]

Last update from the testing. I put the cluster in *ideal* conditions as you recommended.

JVM settings:
 - CMS
 - Xmx8g, Xms8G
 
C* settings:
 - concurrent_compactors: 6

Test conditions:
 - cluster *idle* (no write, no read)
 - LCS with *sstable_size_in_mb* = 1024 (1Gb)
 - *no compaction ongoing* (took a whole night to compact for LCS)
 - {{CREATE CUSTOM INDEX resource_period_end_month_int_idx ON sharon.resource_bench (period_end_month_int)
USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = \{'mode': 'SPARSE'\};}}

Observations:
 - I/O idle, CPU not exceeding 20% on average (http://postimg.org/image/f664wm8dp/)
 - {{nodetool compactionstats}} only show 1 index rebuild ongoing per node  
{noformat}
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b4f4b0-ee6a-11e5-81f5-bd5584064785 Secondary index build sharon   resource_bench 9535985318
18920482745 bytes 50.40%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b65440-ee6a-11e5-b44b-4deeb5ac98a3 Secondary index build sharon   resource_bench 9464081317
20988668046 bytes 45.09%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b3bc30-ee6a-11e5-a152-db40f4fbe6b8 Secondary index build sharon   resource_bench 9471325678
17061191471 bytes 55.51%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b45870-ee6a-11e5-b26b-53ed13e9667e Secondary index build sharon   resource_bench 9120598050
18921737677 bytes 48.20%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b45870-ee6a-11e5-b2a3-331c04173c53 Secondary index build sharon   resource_bench 8943568835
20591008789 bytes 43.43%
id                                   compaction type       keyspace table          completed
  total       unit  progress
d8b47f80-ee6a-11e5-9fc8-0597212274c1 Secondary index build sharon   resource_bench 10172038156
21422242706 bytes 47.48%
id                                   compaction type       keyspace table          completed
  total       unit  progress
d8b34700-ee6a-11e5-a642-6dee841e75e5 Secondary index build sharon   resource_bench 10161205385
18730171060 bytes 54.25%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b6f080-ee6a-11e5-8da4-bd70732fdab1 Secondary index build sharon   resource_bench 9961529350
21294352899 bytes 46.78%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b43160-ee6a-11e5-8ac9-f59d626eedfa Secondary index build sharon   resource_bench 9160286080
22153527929 bytes 41.35%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b51bc0-ee6a-11e5-8aa0-b9e611280aba Secondary index build sharon   resource_bench 9397690505
22791700212 bytes 41.23%
id                                   compaction type       keyspace table          completed
  total       unit  progress
d8b542d0-ee6a-11e5-8521-fbd14b018db6 Secondary index build sharon   resource_bench 10029096174
18910334578 bytes 53.04%
id                                   compaction type       keyspace table          completed
  total       unit  progress
d8b40a50-ee6a-11e5-a7b2-4b114ced0935 Secondary index build sharon   resource_bench 10118551269
16938426769 bytes 59.74%
id                                   compaction type       keyspace table          completed
 total       unit  progress
d8b039c0-ee6a-11e5-9a98-ff9a6f2af762 Secondary index build sharon   resource_bench 9003236945
18252472495 bytes 49.33%
{noformat}
 - *there are still A LOT of GC*
{noformat}
INFO  [Service Thread] 2016-03-20 09:46:44,695 GCInspector.java:284 - ParNew GC in 455ms.
 CMS Old Gen: 2964960608 -> 3487392640; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:46:47,250 GCInspector.java:284 - ParNew GC in 460ms.
 CMS Old Gen: 3487392640 -> 3990379824; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125829120 -> 125828160
INFO  [Service Thread] 2016-03-20 09:46:49,452 GCInspector.java:284 - ParNew GC in 414ms.
 CMS Old Gen: 3990379824 -> 4445691424; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125828160 -> 125827840
INFO  [Service Thread] 2016-03-20 09:46:52,328 GCInspector.java:284 - ParNew GC in 484ms.
 CMS Old Gen: 4445691424 -> 4968532112; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125827840 -> 125827440
INFO  [Service Thread] 2016-03-20 09:46:54,940 GCInspector.java:284 - ParNew GC in 495ms.
 CMS Old Gen: 4968532112 -> 5462199544; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125827440 -> 125829120
INFO  [Service Thread] 2016-03-20 09:46:57,406 GCInspector.java:284 - ParNew GC in 463ms.
 CMS Old Gen: 5462199544 -> 5924296504; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125829120 -> 125828160
INFO  [Service Thread] 2016-03-20 09:47:12,521 GCInspector.java:284 - ParNew GC in 670ms.
 CMS Old Gen: 5924296504 -> 5985242088; Par Survivor Space: 125828160 -> 125827760
WARN  [Service Thread] 2016-03-20 09:47:12,528 GCInspector.java:282 - ConcurrentMarkSweep
GC in 11108ms.  CMS Old Gen: 5985242088 -> 4121205176; Par Eden Space: 1006632960 ->
0; Par Survivor Space: 125827760 -> 0
INFO  [Service Thread] 2016-03-20 09:47:41,050 GCInspector.java:284 - ParNew GC in 209ms.
 CMS Old Gen: 4748643432 -> 4955516288; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:29,246 GCInspector.java:284 - ParNew GC in 206ms.
 CMS Old Gen: 3790280880 -> 3996984480; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:39,634 GCInspector.java:284 - ParNew GC in 272ms.
 CMS Old Gen: 4361111240 -> 4620112304; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 45286592 -> 125829120
INFO  [Service Thread] 2016-03-20 09:49:41,290 GCInspector.java:284 - ParNew GC in 289ms.
 CMS Old Gen: 4620112304 -> 4937794000; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:43,160 GCInspector.java:284 - ParNew GC in 383ms.
 CMS Old Gen: 4937794000 -> 5426684640; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:45,309 GCInspector.java:284 - ParNew GC in 431ms.
 CMS Old Gen: 5426684640 -> 5947461904; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:47,370 GCInspector.java:284 - ParNew GC in 402ms.
 CMS Old Gen: 5947461904 -> 6398462800; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:49,857 GCInspector.java:284 - ParNew GC in 423ms.
 CMS Old Gen: 5444204224 -> 5937269640; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:52,407 GCInspector.java:284 - ParNew GC in 455ms.
 CMS Old Gen: 3157064816 -> 3668162472; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:54,572 GCInspector.java:284 - ParNew GC in 400ms.
 CMS Old Gen: 2840515104 -> 3287142872; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:57,162 GCInspector.java:284 - ParNew GC in 465ms.
 CMS Old Gen: 3287142872 -> 3792393832; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:49:59,862 GCInspector.java:284 - ParNew GC in 470ms.
 CMS Old Gen: 3792393832 -> 4304257848; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125829120 -> 125828240
INFO  [Service Thread] 2016-03-20 09:50:02,071 GCInspector.java:284 - ParNew GC in 418ms.
 CMS Old Gen: 4304257848 -> 4743542344; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125828240 -> 125828000
INFO  [Service Thread] 2016-03-20 09:50:04,859 GCInspector.java:284 - ParNew GC in 502ms.
 CMS Old Gen: 4743542344 -> 5263286088; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125828000 -> 125827760
INFO  [Service Thread] 2016-03-20 09:50:07,747 GCInspector.java:284 - ParNew GC in 492ms.
 CMS Old Gen: 5263286088 -> 5773111032; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125827760 -> 125827520
INFO  [Service Thread] 2016-03-20 09:50:10,679 GCInspector.java:284 - ParNew GC in 383ms.
 CMS Old Gen: 5773111032 -> 6032633936; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125827520 -> 125827280
INFO  [Service Thread] 2016-03-20 09:50:14,371 GCInspector.java:284 - ParNew GC in 212ms.
 CMS Old Gen: 5728973400 -> 5854698504; Par Eden Space: 1006632960 -> 0; Par Survivor
Space: 125827280 -> 125826960
INFO  [Service Thread] 2016-03-20 09:50:38,495 GCInspector.java:284 - ParNew GC in 205ms.
 CMS Old Gen: 5019081080 -> 5235117656; Par Eden Space: 1006632960 -> 0;
INFO  [Service Thread] 2016-03-20 09:51:05,133 GCInspector.java:284 - ParNew GC in 206ms.
 CMS Old Gen: 947772192 -> 1118731384; Par Eden Space: 1006632960 -> 0;
{noformat}

The GC activity is quite worrisome because if the cluster was not IDLE and have active reads/writes/compactions,
I would bet that such GC will bring it down completely ...

Do you think memory management for index build is going to be fixed along side with current
OOM issue or do you want me to open a separate JIRA ticket to address the issue ?

 I can upload all the necessary logs (system.log, gc.log). I took care to clean all the logs
right *before* starting the index creation so that we can have a clear view and log messages
for this process

> SASI index build leads to massive OOM
> -------------------------------------
>
>                 Key: CASSANDRA-11383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: C* 3.4
>            Reporter: DOAN DuyHai
>         Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments are flush
to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message