cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
Date Tue, 29 Mar 2016 16:04:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216233#comment-15216233
] 

Jack Krupansky commented on CASSANDRA-11383:
--------------------------------------------

Thanks, [~jrwest] and [~doanduyhai]. I think I finally have the SASI terminology down now
- SPARSE modes means that the index is sparse (few index entries per original column value)
while the column data is dense (many distinct values.) And that non-SPARSE (AKA PREFIX) mode,
the default mode, supports any cardinality of data, especially the low cardinality data that
SPARSE mode does not support.

Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode is considered advisable/recommended
for high cardinality column data, where SPARSE mode is nominally a better choice. Maybe that
is strictly a matter of whether the prefix/LIKE feature is to be utilized - if so, than PREFIX
mode is required, but if not, SPARSE mode sounds like the better choice. But I don't have
a handle on the internal index structures to know if that's absolutely the case - that a PREFIX
index for SPARSE data would necessarily be larger and/or slower than a SPARSE index for high
cardinality data. I would hope so, but it would be good to have that confirmed.

> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: C* 3.4
>            Reporter: DOAN DuyHai
>            Assignee: Jordan West
>              Labels: sasi
>             Fix For: 3.5
>
>         Attachments: CASSANDRA-11383.patch, SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz,
new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments are flush
to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message