cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Performance degradation observed through embedded cassandra server - pointers needed
Date Sun, 02 Oct 2011 22:51:48 GMT
Deleting the data may not be the right approach here if you want to have a clean slate to start
the next test. It will leave tombstones around, which may reduce your performance if you make
a lot of deletes. It's pedantic, but it's different to truncate or drop. 

Truncate is doing a few more things that result in something a bit more like a clean slate
(https://github.com/apache/cassandra/blob/cassandra-0.8.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1969)

* flushes CF changes to disk
* discards commit logs
* snapshots existing SSTables
* marks the existing SSTables as compacted so they are no longer used in reads. 

(drop keyspace is not too different)

If the slate you wish to clear, truncate or drop keyspace will be your friends. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 1/10/2011, at 5:56 AM, Roshan Dawrani wrote:

> Hi,
> 
> For our Grails + Cassandra application's clean-DB-for-every-test needs, we finally went
back from using costly "truncate" calls to "range-scans-and-delete" approach, and found such
a great different between the performance of the two approaches, that wrote a small blog post
here about it: "Grails, Cassandra: Giving each test a clean DB to work with" For someone in
a similar situation, it may present an alternative.
> 
> Cheers.
> 
> On Fri, Sep 23, 2011 at 1:29 PM, Roshan Dawrani <roshandawrani@gmail.com> wrote:
> Thanks for sharing your inputs, Edward. Some comments inline below:
> 
> On Thu, Sep 22, 2011 at 7:31 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> 
> 1) Should should try to dig in an determine why the truncate is slower. Look for related
jira issues on truncation. 
> 
> I should give it a try. I thought I might get some readymade pointers from people already
knowing about 0.7.2 / 0.8.5 differences on whether our approach to truncate every test has
gone even worse due to some changes in that area.
>  
> Cassandra had some re-entrant code you could fork a JVM each test and use the CassandraServiceDataCleaner.
(However multiple startups could end up causing more overhead then the truncation)
> 
> I avoid this problem by using a different column family and or a different keyspaces
for all my unit tests in a single class. Each class bring up a new embedded cluster and uses
the data cleaner to sanitize the data directories. So essentially I never call truncate.
> 
> In both these approaches, won't I need to re-build the schema for every test too? Certainly
in the 2nd case, if I end up creating new keyspace or different column families for each test.
I am not sure what I will gain there in terms of performance. I was hoping data truncation
leaving schema there would be faster than that.
> 
> -- 
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani
> Skype: roshandawrani
> 
> 
> 
> 
> -- 
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani
> Skype: roshandawrani
> 


Mime
View raw message