From user-return-21248-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sun Oct 2 22:52:21 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D4EB72FF for ; Sun, 2 Oct 2011 22:52:21 +0000 (UTC) Received: (qmail 73535 invoked by uid 500); 2 Oct 2011 22:52:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 73506 invoked by uid 500); 2 Oct 2011 22:52:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 73498 invoked by uid 99); 2 Oct 2011 22:52:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 22:52:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a57.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 22:52:13 +0000 Received: from homiemail-a57.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTP id 62F4F208065 for ; Sun, 2 Oct 2011 15:51:52 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=RyCg9v4Nnm xqBR3fJCBsaXtEtRqNWwlF/peChu+q36MHhAFT76uRUY8f6XsECmFyFNiQliCxC6 qSsXBlvsFgS5/2F35owJVN0IYCKzBKXWMV7vUXAaPE7qEp3sbf5QWl7UCfRKRcqJ BVRE9RGacCW6mtrhR1phbAtt1lmiMTxTM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=btenouWlofk1xkSA zQoVml21R2g=; b=a56pjTlzM6cTGQq6Wum4ObU6RSiNxk+Mcf47O8RPzQszTkvZ s/viWsdqZ/WQlZCZfaZkZjnmLKUShAXCgIyjPLqSu3qyJvBqbrbHA9/XqrbmnY2j ECnWdz+4QK+Ukms51vqIZatk/+vprC5Sf+FJK9TXW5NEG3p4s7HQPkPr89A= Received: from [172.16.1.4] (222-152-100-77.jetstream.xtra.co.nz [222.152.100.77]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTPSA id 709B0208060 for ; Sun, 2 Oct 2011 15:51:51 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_D1DF8792-FBB3-4953-8A88-A6AB15CAF55C" Subject: Re: Performance degradation observed through embedded cassandra server - pointers needed Date: Mon, 3 Oct 2011 11:51:48 +1300 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_D1DF8792-FBB3-4953-8A88-A6AB15CAF55C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Deleting the data may not be the right approach here if you want to have = a clean slate to start the next test. It will leave tombstones around, = which may reduce your performance if you make a lot of deletes. It's = pedantic, but it's different to truncate or drop.=20 Truncate is doing a few more things that result in something a bit more = like a clean slate = (https://github.com/apache/cassandra/blob/cassandra-0.8.6/src/java/org/apa= che/cassandra/db/ColumnFamilyStore.java#L1969) * flushes CF changes to disk * discards commit logs * snapshots existing SSTables * marks the existing SSTables as compacted so they are no longer used in = reads.=20 (drop keyspace is not too different) If the slate you wish to clear, truncate or drop keyspace will be your = friends.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1/10/2011, at 5:56 AM, Roshan Dawrani wrote: > Hi, >=20 > For our Grails + Cassandra application's clean-DB-for-every-test = needs, we finally went back from using costly "truncate" calls to = "range-scans-and-delete" approach, and found such a great different = between the performance of the two approaches, that wrote a small blog = post here about it: "Grails, Cassandra: Giving each test a clean DB to = work with" For someone in a similar situation, it may present an = alternative. >=20 > Cheers. >=20 > On Fri, Sep 23, 2011 at 1:29 PM, Roshan Dawrani = wrote: > Thanks for sharing your inputs, Edward. Some comments inline below: >=20 > On Thu, Sep 22, 2011 at 7:31 PM, Edward Capriolo = wrote: >=20 > 1) Should should try to dig in an determine why the truncate is = slower. Look for related jira issues on truncation.=20 >=20 > I should give it a try. I thought I might get some readymade pointers = from people already knowing about 0.7.2 / 0.8.5 differences on whether = our approach to truncate every test has gone even worse due to some = changes in that area. > =20 > Cassandra had some re-entrant code you could fork a JVM each test and = use the CassandraServiceDataCleaner. (However multiple startups could = end up causing more overhead then the truncation) >=20 > I avoid this problem by using a different column family and or a = different keyspaces for all my unit tests in a single class. Each class = bring up a new embedded cluster and uses the data cleaner to sanitize = the data directories. So essentially I never call truncate. >=20 > In both these approaches, won't I need to re-build the schema for = every test too? Certainly in the 2nd case, if I end up creating new = keyspace or different column families for each test. I am not sure what = I will gain there in terms of performance. I was hoping data truncation = leaving schema there would be faster than that. >=20 > --=20 > Roshan > Blog: http://roshandawrani.wordpress.com/ > Twitter: @roshandawrani > Skype: roshandawrani >=20 >=20 >=20 >=20 > --=20 > Roshan > Blog: http://roshandawrani.wordpress.com/ > Twitter: @roshandawrani > Skype: roshandawrani >=20 --Apple-Mail=_D1DF8792-FBB3-4953-8A88-A6AB15CAF55C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 https://github.com/a= pache/cassandra/blob/cassandra-0.8.6/src/java/org/apache/cassandra/db/Colu= mnFamilyStore.java#L1969)

* flushes CF = changes to disk
* discards commit logs
* snapshots = existing SSTables
* marks the existing SSTables as compacted = so they are no longer used in = reads. 

(drop keyspace is not too = different)

If the slate you wish to clear, = truncate or drop keyspace will be your = friends. 

Cheers


http://www.thelastpickle.com

On 1/10/2011, at 5:56 AM, Roshan Dawrani wrote:

Hi,

For our Grails + = Cassandra application's clean-DB-for-every-test needs, we finally went = back from using costly "truncate" calls to "range-scans-and-delete" = approach, and found such a great different between the performance of = the two approaches, that wrote a small blog post here about it: "Grails, Cassandra: Giving each = test a clean DB to work with" For someone in a similar = situation, it may present an alternative.

Cheers.

On = Fri, Sep 23, 2011 at 1:29 PM, Roshan Dawrani <roshandawrani@gmail.com> wrote:
Thanks for sharing = your inputs, Edward. Some comments inline below:

On Thu, Sep 22, 2011 at 7:31 PM, = Edward Capriolo <edlinuxguru@gmail.com> wrote:

1) Should should try to dig in an determine why the truncate is slower. = Look for related jira issues on truncation. 

I should give it a try. I = thought I might get some readymade pointers from people already knowing = about 0.7.2 / 0.8.5 differences on whether our approach to truncate = every test has gone even worse due to some changes in that area.
 
Cassandra had = some re-entrant code you could fork a JVM each test and use the = CassandraServiceDataCleaner. (However multiple startups could end up = causing more overhead then the truncation)

I avoid this problem by using a different column = family and or a different keyspaces for all my unit tests in a single = class. Each class bring up a new embedded cluster and uses the data = cleaner to sanitize the data directories. So essentially I never call = truncate.

In both these approaches, won't I = need to re-build the schema for every test too? Certainly in the 2nd = case, if I end up creating new keyspace or different column families for = each test. I am not sure what I will gain there in terms of performance. = I was hoping data truncation leaving schema there would be faster than = that.

-- =
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani
Skype: roshandawrani




-- =
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani
Skype: roshandawrani


= --Apple-Mail=_D1DF8792-FBB3-4953-8A88-A6AB15CAF55C--