Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D15E177EF for ; Thu, 12 Mar 2015 22:22:39 +0000 (UTC) Received: (qmail 30618 invoked by uid 500); 12 Mar 2015 22:22:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 30577 invoked by uid 500); 12 Mar 2015 22:22:38 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 30565 invoked by uid 99); 12 Mar 2015 22:22:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Mar 2015 22:22:38 +0000 Date: Thu, 12 Mar 2015 22:22:38 +0000 (UTC) From: "Dan Kinder (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8961) Data rewrite case causes almost non-functional compaction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359540#comment-14359540 ] Dan Kinder commented on CASSANDRA-8961: --------------------------------------- I see. Is there some way to make this DELETE query not use RangeTombstones? Would it work to insert the full set of columns (ex. DELETE pk, data FROM ...)? Also CASSANDRA-6446 seems related. > Data rewrite case causes almost non-functional compaction > --------------------------------------------------------- > > Key: CASSANDRA-8961 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8961 > Project: Cassandra > Issue Type: Bug > Environment: Centos 6.6, Cassandra 2.0.12 (Also seen in Cassandra 2.1) > Reporter: Dan Kinder > Priority: Minor > > There seems to be a bug of some kind where compaction grinds to a halt in this use case: from time to time we have a set of rows we need to "migrate", changing their primary key by deleting the row and inserting a new row with the same partition key and different cluster key. The python script below demonstrates this; it takes a bit of time to run (didn't try to optimize it) but when it's done it will be trying to compact a few hundred megs of data for a long time... on the order of days, or it will never finish. > Not verified by this sandboxed experiment but it seems that compression settings do not matter and that this seems to happen to STCS as well, not just LCS. I am still testing if other patterns cause this terrible compaction performance, like deleting all rows then inserting or vice versa. > Even if it isn't a "bug" per se, is there a way to fix or work around this behavior? > {code} > import string > import random > from cassandra.cluster import Cluster > cluster = Cluster(['localhost']) > db = cluster.connect('walker') > db.execute("DROP KEYSPACE IF EXISTS trial") > db.execute("""CREATE KEYSPACE trial > WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1 }""") > db.execute("""CREATE TABLE trial.tbl ( > pk text, > data text, > PRIMARY KEY(pk, data) > ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' } > AND compression = {'sstable_compression': ''}""") > # Number of rows to insert and "move" > n = 200000 > > # Insert n rows with the same partition key, 1KB of unique data in cluster key > for i in range(n): > db.execute("INSERT INTO trial.tbl (pk, data) VALUES ('thepk', %s)", > [str(i).zfill(1024)]) > # Update those n rows, deleting each and replacing with a very similar row > for i in range(n): > val = str(i).zfill(1024) > db.execute("DELETE FROM trial.tbl WHERE pk = 'thepk' AND data = %s", [val]) > db.execute("INSERT INTO trial.tbl (pk, data) VALUES ('thepk', %s)", ["1" + val]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)