cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-6534) Slow inserts with collections into a single partition (Pathological GC behavior)
Date Fri, 19 Sep 2014 21:08:34 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benedict resolved CASSANDRA-6534.
---------------------------------
    Resolution: Duplicate

As previously stated, 2.1 would help here a great deal through CASSANDRA-5417 which would
reduce both the garbage generation and the time taken to apply the update, thereby reducing
the race window. Other than that, CASSANDRA-7546 addresses exactly this scenario of competing
updates to the same partition racing and wasting work / generating garbage.

However, I suggest the simplest solution to this is to batch your updates to the same partition.
If you batch all of your updates to a partition, you will not hit this problem.

> Slow inserts with collections into a single partition (Pathological GC behavior)
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6534
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6534
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: dsc12-1.2.12-1.noarch.rpm
> cassandra12-1.2.12-1.noarch.rpm
> centos 6.4
>            Reporter: Michael Penick
>             Fix For: 2.0.11
>
>         Attachments: GC_behavior.png
>
>
> We noticed extremely slow insertion rates to a single partition key, using composite
column with a collection value. We were not able to replicate the issue using the same schema,
but with a non-colleciton value and using much larger values.  During the collection insertion
tests we have tons of these messages in the system.log:
> "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 collections, 1233256368
used; max is 8375238656"
> We are inserting a tiny amounts of data 32-64 bytes and seeing the issue after only a
couple 10k inserts. The amount of memory being used by C*/JVM is no where near proportional
to the amount data being inserted. Why is C* consuming so much memory?
> Attached is a picture of the GC under one of the pathological tests. Keep in mind we
are only inserting 128KB - 256KB of data and we are almost hitting the limit of the heap.
> GC flags:
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -Xms8192M
> -Xmx8192M
> -Xmn2048M
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss180k
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> Example schemas:
> Note: The type of collection or primitive type in the collection doesn't seem to matter.
> {code}
> CREATE TABLE test.test (
> row_key text, 
> column_key uuid,
>  column_value list<int>, 
> PRIMARY KEY(row_key, column_key));
> CREATE TABLE test.test (
> row_key text, 
> column_key uuid, 
> column_value map<text, text>, 
> PRIMARY KEY(row_key, column_key));
> {code}
> Example inserts:
> Note: This issue is able to be replicated with extremely small inserts (a well as larger
~1KB)
> {code}
> INSERT INTO test.test 
> (row_key, column_key, column_value)
> VALUES 
> ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]);
> INSERT INTO test.test 
> (row_key, column_key, column_value) 
> VALUES
> ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': '0123456701234567012345670',
 'b': '0123456701234567012345670' });
> {code}
> As a comparison, I was able to run the same tests with the following schema with no issue:
> Note: This test was able to run at a much faster insertion speed, for much longer and
much bigger column sizes (1KB) without any GC issues.
> {code}
> CREATE TABLE test.test (
> row_key text, 
> column_key uuid, 
> column_value text, 
> PRIMARY KEY(row_key, column_key) )
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message