Deleting the data may not be the right approach here if you want to have a clean slate to start the next test. It will leave tombstones around, which may reduce your performance if you make a lot of deletes. It's pedantic, but it's different to truncate or drop.
* flushes CF changes to disk
* discards commit logs
* snapshots existing SSTables
* marks the existing SSTables as compacted so they are no longer used in reads.
(drop keyspace is not too different)
If the slate you wish to clear, truncate or drop keyspace will be your friends.
Freelance Cassandra Developer
On 1/10/2011, at 5:56 AM, Roshan Dawrani wrote:
For our Grails + Cassandra application's clean-DB-for-every-test needs, we finally went back from using costly "truncate" calls to "range-scans-and-delete" approach, and found such a great different between the performance of the two approaches, that wrote a small blog post here about it: "Grails, Cassandra: Giving each test a clean DB to work with
" For someone in a similar situation, it may present an alternative.
On Fri, Sep 23, 2011 at 1:29 PM, Roshan Dawrani <firstname.lastname@example.org>
Thanks for sharing your inputs, Edward. Some comments inline below:
On Thu, Sep 22, 2011 at 7:31 PM, Edward Capriolo <email@example.com>
1) Should should try to dig in an determine why the truncate is slower. Look for related jira issues on truncation.
I should give it a try. I thought I might get some readymade pointers from people already knowing about 0.7.2 / 0.8.5 differences on whether our approach to truncate every test has gone even worse due to some changes in that area.
Cassandra had some re-entrant code you could fork a JVM each test and use the CassandraServiceDataCleaner. (However multiple startups could end up causing more overhead then the truncation)
I avoid this problem by using a different column family and or a different keyspaces for all my unit tests in a single class. Each class bring up a new embedded cluster and uses the data cleaner to sanitize the data directories. So essentially I never call truncate.
In both these approaches, won't I need to re-build the schema for every test too? Certainly in the 2nd case, if I end up creating new keyspace or different column families for each test. I am not sure what I will gain there in terms of performance. I was hoping data truncation leaving schema there would be faster than that.