Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5F69100BC for ; Tue, 31 Dec 2013 18:43:53 +0000 (UTC) Received: (qmail 57308 invoked by uid 500); 31 Dec 2013 18:43:53 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 57279 invoked by uid 500); 31 Dec 2013 18:43:53 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 57233 invoked by uid 99); 31 Dec 2013 18:43:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Dec 2013 18:43:53 +0000 Date: Tue, 31 Dec 2013 18:43:53 +0000 (UTC) From: "Michael Penick (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-6534) Slow inserts with collections into a single partition (Pathological GC behavior) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Penick updated CASSANDRA-6534: -------------------------------------- Description: We noticed extremely slow insertion rates to a single partition key, using composite column with a collection value. We were not able to replicate the issue using the same schema, but with a non-colleciton value even with much larger values. There are also tons of these messages in the system.log: "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 collections, 1233256368 used; max is 8375238656" We are inserting a tiny amounts of data 32-64 bytes and seeing the issue after only a couple 10k inserts. The amount of memory being used by C*/JVM is no where near proportional to the amount data being inserted. Why is C* consuming so much memory? Attached is a picture of the GC under one of the pathological tests. Keep in mind we are only inserting 128KB - 256KB of data and we are almost hitting the limit of the heap. GC flags: -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB Example schemas: Note: The type of collection or primitive type in the collection doesn't seem to matter. {code} CREATE TABLE test.test ( row_key text, column_key uuid, column_value list, PRIMARY KEY(row_key, column_key)); CREATE TABLE test.test ( row_key text, column_key uuid, column_value map, PRIMARY KEY(row_key, column_key)); {code} Example inserts: Note: This issue is able to be replicated with extremely small inserts (a well as larger ~1KB) {code} INSERT INTO test.test (row_key, column_key, column_value) VALUES ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]); INSERT INTO test.test (row_key, column_key, column_value) VALUES ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': '0123456701234567012345670', 'b': '0123456701234567012345670' }); {code} As a comparison I was able to run the same tests with the following schema with no issue: Note: This test was able to run a much faster insertion speed and much bigger column sizes (1KB) without any GC issues. {code} CREATE TABLE test.test ( row_key text, column_key uuid, column_value text, PRIMARY KEY(row_key, column_key) ) {code} was: We noticed extremely slow insertion rates to a single partition key, using composite column with a collection value. We were not able to replicate the issue using the same schema, but with a non-colleciton value even with much larger values. There are also tons of these messages in the system.log: "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 collections, 1233256368 used; max is 8375238656" We are inserting a tiny amounts of data 32-64 bytes and seeing the issue after only a couple 10k inserts. The amount of memory being used by C*/JVM is no where near proportional to the amount data being inserted. Why is C* consuming so much memory? Attached is a picture of the GC under the one of the pathological tests. Keep in mind we are only inserting 128KB - 256KB of data and we are almost hitting the limit of the heap. GC flags: -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB Example schemas: Note: The type of collection or primitive type in the collection doesn't seem to matter. {code} CREATE TABLE test.test ( row_key text, column_key uuid, column_value list, PRIMARY KEY(row_key, column_key)); CREATE TABLE test.test ( row_key text, column_key uuid, column_value map, PRIMARY KEY(row_key, column_key)); {code} Example inserts: Note: This issue is able to be replicated with extremely small inserts (a well as larger ~1KB) {code} INSERT INTO test.test (row_key, column_key, column_value) VALUES ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]); INSERT INTO test.test (row_key, column_key, column_value) VALUES ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': '0123456701234567012345670', 'b': '0123456701234567012345670' }); {code} As a comparison I was able to run the same tests with the following schema with no issue: Note: This test was able to run a much faster insertion speed and much bigger column sizes (1KB) without any GC issues. {code} CREATE TABLE test.test ( row_key text, column_key uuid, column_value text, PRIMARY KEY(row_key, column_key) ) {code} > Slow inserts with collections into a single partition (Pathological GC behavior) > -------------------------------------------------------------------------------- > > Key: CASSANDRA-6534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6534 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: dsc12-1.2.12-1.noarch.rpm > cassandra12-1.2.12-1.noarch.rpm > centos 6.4 > Reporter: Michael Penick > Fix For: 1.2.12 > > Attachments: GC_behavior.png > > > We noticed extremely slow insertion rates to a single partition key, using composite column with a collection value. We were not able to replicate the issue using the same schema, but with a non-colleciton value even with much larger values. There are also tons of these messages in the system.log: > "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 collections, 1233256368 used; max is 8375238656" > We are inserting a tiny amounts of data 32-64 bytes and seeing the issue after only a couple 10k inserts. The amount of memory being used by C*/JVM is no where near proportional to the amount data being inserted. Why is C* consuming so much memory? > Attached is a picture of the GC under one of the pathological tests. Keep in mind we are only inserting 128KB - 256KB of data and we are almost hitting the limit of the heap. > GC flags: > -XX:+UseThreadPriorities > -XX:ThreadPriorityPolicy=42 > -Xms8192M > -Xmx8192M > -Xmn2048M > -XX:+HeapDumpOnOutOfMemoryError > -Xss180k > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB > Example schemas: > Note: The type of collection or primitive type in the collection doesn't seem to matter. > {code} > CREATE TABLE test.test ( > row_key text, > column_key uuid, > column_value list, > PRIMARY KEY(row_key, column_key)); > CREATE TABLE test.test ( > row_key text, > column_key uuid, > column_value map, > PRIMARY KEY(row_key, column_key)); > {code} > Example inserts: > Note: This issue is able to be replicated with extremely small inserts (a well as larger ~1KB) > {code} > INSERT INTO test.test > (row_key, column_key, column_value) > VALUES > ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]); > INSERT INTO test.test > (row_key, column_key, column_value) > VALUES > ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': '0123456701234567012345670', 'b': '0123456701234567012345670' }); > {code} > As a comparison I was able to run the same tests with the following schema with no issue: > Note: This test was able to run a much faster insertion speed and much bigger column sizes (1KB) without any GC issues. > {code} > CREATE TABLE test.test ( > row_key text, > column_key uuid, > column_value text, > PRIMARY KEY(row_key, column_key) ) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)