cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hendry" <>
Subject RE: SEVERE Data Corruption Problems
Date Thu, 10 Feb 2011 00:51:05 GMT
I will put two nodes on 0.7. Did you really mean CASSANDRA-1992? I looked
over the bug report and patch but cant see how it is related to the problems
I have been having. I am not performing bootstraps or repairs and I haven’t
since one of the most problematic CFs has been created. I have also looked
over the resolved issues for 0.7.1 and did not see anything which I thought
could be related. 

I would love to provide a test cluster and we actually have one for our
development environment but it is working flawlessly. Exact same Cassandra
version, application code, java version and OS. The only difference is that
it has a far lower write load and is in EC2 instead of on physical machines.
Its one of the reasons I believe I am hitting some strange race/edge
condition somewhere.

Looking over the user list, it seems at least one other person is having the
same type of problem: .
Although I have not seen the second error (possibly because I don’t do range
slices), the first error looks eerily familiar.


-----Original Message-----
From: Jonathan Ellis [] 
Sent: February-09-11 18:14
To: dev
Subject: Re: SEVERE Data Corruption Problems

Hi Dan,

it would be very useful to test with 0.7 branch instead of 0.7.0 so at
least you're not chasing known and fixed bugs like CASSANDRA-1992.

As you say, there's a lot of people who aren't seeing this, so it
would also be useful if you can provide some kind of test harness
where you can say "point this at a cluster and within a few hours

On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry <>
> I have been having SEVERE data corruption issues with SSTables in my
> cluster, for one CF it was happening almost daily (I have since shut down
> the service using that CF as it was too much work to manage the Cassandra
> errors). At this point, I can’t see how it is anything but a Cassandra bug
> yet it’s somewhat strange and very scary that I am the only one who seems
> be having such serious issues. Most of my data is indexed in two ways so I
> have been able to write a validator which goes through and back fills
> missing data but it’s kind of defeating the whole point of Cassandra. The
> only way I have found to deal with issues when they crop up to prevent
> crashing from repeated failed compactions is delete the SSTable. My
> is running a slightly modified 0.7.0 version which logs what files errors
> for so that I can stop the node and delete them.
> The problem:
> -          Reads, compactions and hinted handoff fail with various
> exceptions (samples shown at the end of this email) which seem to indicate
> sstable corruption.
> -          I have seen failed reads/compactions/hinted handoff on 4 out of
> nodes (RF=2) for 3 different super column families and 1 standard column
> family (4 out of 11) and just now, the Hints system CF. (if it matters the
> ring has not changed since one CF which has been giving me trouble was
> created). I have check SMART disk info and run various diagnostics and
> does not seem to be any hardware issues, plus what are the chances of all
> four nodes having the same hardware problems at the same time when for all
> other purposes, they appear fine?
> -          I have added logging which outputs what sstable are causing
> exceptions to be thrown. The corrupt sstables have been both freshly
> memtables and the output of compaction (ie, 4 sstables which all seem to
> fine get compacted to 1 which is then corrupt). It seems that the majority
> of corrupt sstables are post-compacted (vs post-memtable flush).
> -          The one CF which was giving me the most problems was heavily
> written to (1000-1500 writes/second continually across the cluster). For
> that cf, was having to deleting 4-6 sstables a day across the cluster (and
> the number was going up, even the number of problems for remaining CFs is
> going up). The other CFs which have had corrupt sstables are also quite
> heavily written to (generally a few hundred writes a second across the
> cluster).
> -          Most of the time (5/6 attempts) when this problem occurs,
> sstable2json also fails. I have however, had one case where I was able to
> export the sstable to json, then re-import it at which point I was no
> seeing exceptions.
> -          The cluster has been running for a little over 2 months now,
> problem seems to have sprung up in the last 3-4 weeks and seems to be
> steadily getting worse.
> Ultimately, I think I am hitting some subtle race condition somewhere. I
> have been starting to dig into the Cassandra code but I barely know where
> start looking. I realize I have not provided nearly enough information to
> easily debug the problem but PLEASE keep your eyes open for possibly racy
> buggy code which could cause these sorts of problems. I am willing to
> provided full Cassandra logs and a corrupt SSTable on an individual basis:
> please email me and let me know.
> Here is possibly relevant information and my theories on a possible root
> cause. Again, I know little about the Cassandra code base and have only
> moderate java experience so these theories may be way off base.
> -          Strictly speaking, I probably don’t have enough memory for my
> workload. I see stop the world gc occurring ~30/day/node, often causing
> Cassandra to hang for 30+ seconds (according to the gc logs). Could there
> some java bug where a full gc in the middle of writing or flushing
> (compaction/memtable flush) or doing some other disk based activity causes
> some sort of data corruption?
> -          Writes are usually done at ConsistencyLevel ONE with additional
> client side retry logic. Given that I often see consecutive nodes in the
> ring down, could there be some edge condition where dying at just the
> time causes parts of mutations/messages to be lost?
> -          All of the CFs which have been causing me problems have large
> rows which are compacted incrementally. Could there be some problem with
> incremental compaction logic?
> -          My cluster has a fairly heavy write load (again, the most
> problematic CF is getting 1500 (w/s)/(RF=2) = 750 writes/second/node).
> Furthermore, it is highly probable that there are timestamp collisions.
> Could there be some issue with timestamp logic (ie, using > instead of >=
> some such) during flushes/compaction?
> -          Once a node
> Cluster/system information:
> -          4 nodes with RF=2
> -          Nodes have 8 cores with 24 GB of RAM a piece.
> -          2 HDs, 1 for commit log/system, 1 for /var/lib/cassandra/data
> -          OS is Ubuntu 10.04 (uname –r = 2.6.32-24-server)
> -          Java:
> o   java version "1.6.0_22"
> o   Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> o   Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
> -          Slightly modified (file information in exceptions) version of
> 0.7.0
> The following non-standard cassandra.yaml properties have been changed:
> -          commitlog_sync_period_in_ms: 100 (with commitlog_sync:
> -          disk_access_mode: mmap_index_only
> -          concurrent_reads: 12
> -          concurrent_writes: 2 (was 32, but I dropped it to 2 to try and
> eliminate any mutation race conditions – did not seem to help)
> -          sliced_buffer_size_in_kb: 128
> -          in_memory_compaction_limit_in_mb: 50
> -          rpc_timeout_in_ms: 15000
> Schema for most problematic CF:
> name: DeviceEventsByDevice
> column_type: Standard
> memtable_throughput_in_mb: 150
> memtable_operations_in_millions: 1.5
> gc_grace_seconds: 172800
> keys_cached: 1000000
> rows_cached: 0
> Dan Hendry
> (403) 660-2297

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
No virus found in this incoming message.
Checked by AVG - 
Version: 9.0.872 / Virus Database: 271.1.1/3432 - Release Date: 02/09/11

View raw message