incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hendry" <dan.hendry.j...@gmail.com>
Subject RE: SEVERE Data Corruption Problems
Date Fri, 11 Feb 2011 00:18:53 GMT
Upgraded one node to 0.7. Its logging exceptions like mad (thousands per
minute). All like below (which is fairly new to me):

ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java
(line 114) Fatal exception in thread Threa
d[ReadStage:721,5,main]
java.io.IOError: java.io.EOFException
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa
mesIterator.java:75)
        at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam
esQueryFilter.java:59)
        at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil
ter.java:80)
        at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
re.java:1275)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1167)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1095)
        at org.apache.cassandra.db.Table.getRow(Table.java:384)
        at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma
nd.java:60)
        at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
ageProxy.java:473)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:48)
        at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:30)
        at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.
java:108)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName
sIterator.java:106)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa
mesIterator.java:71)
        ... 12 more

Dan


-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: February-09-11 18:14
To: dev
Subject: Re: SEVERE Data Corruption Problems

Hi Dan,

it would be very useful to test with 0.7 branch instead of 0.7.0 so at
least you're not chasing known and fixed bugs like CASSANDRA-1992.

As you say, there's a lot of people who aren't seeing this, so it
would also be useful if you can provide some kind of test harness
where you can say "point this at a cluster and within a few hours

On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry <dan.hendry.junk@gmail.com>
wrote:
> I have been having SEVERE data corruption issues with SSTables in my
> cluster, for one CF it was happening almost daily (I have since shut down
> the service using that CF as it was too much work to manage the Cassandra
> errors). At this point, I can’t see how it is anything but a Cassandra bug
> yet it’s somewhat strange and very scary that I am the only one who seems
to
> be having such serious issues. Most of my data is indexed in two ways so I
> have been able to write a validator which goes through and back fills
> missing data but it’s kind of defeating the whole point of Cassandra. The
> only way I have found to deal with issues when they crop up to prevent
nodes
> crashing from repeated failed compactions is delete the SSTable. My
cluster
> is running a slightly modified 0.7.0 version which logs what files errors
> for so that I can stop the node and delete them.
>
>
>
> The problem:
>
> -          Reads, compactions and hinted handoff fail with various
> exceptions (samples shown at the end of this email) which seem to indicate
> sstable corruption.
>
> -          I have seen failed reads/compactions/hinted handoff on 4 out of
4
> nodes (RF=2) for 3 different super column families and 1 standard column
> family (4 out of 11) and just now, the Hints system CF. (if it matters the
> ring has not changed since one CF which has been giving me trouble was
> created). I have check SMART disk info and run various diagnostics and
there
> does not seem to be any hardware issues, plus what are the chances of all
> four nodes having the same hardware problems at the same time when for all
> other purposes, they appear fine?
>
> -          I have added logging which outputs what sstable are causing
> exceptions to be thrown. The corrupt sstables have been both freshly
flushed
> memtables and the output of compaction (ie, 4 sstables which all seem to
be
> fine get compacted to 1 which is then corrupt). It seems that the majority
> of corrupt sstables are post-compacted (vs post-memtable flush).
>
> -          The one CF which was giving me the most problems was heavily
> written to (1000-1500 writes/second continually across the cluster). For
> that cf, was having to deleting 4-6 sstables a day across the cluster (and
> the number was going up, even the number of problems for remaining CFs is
> going up). The other CFs which have had corrupt sstables are also quite
> heavily written to (generally a few hundred writes a second across the
> cluster).
>
> -          Most of the time (5/6 attempts) when this problem occurs,
> sstable2json also fails. I have however, had one case where I was able to
> export the sstable to json, then re-import it at which point I was no
longer
> seeing exceptions.
>
> -          The cluster has been running for a little over 2 months now,
> problem seems to have sprung up in the last 3-4 weeks and seems to be
> steadily getting worse.
>
>
>
> Ultimately, I think I am hitting some subtle race condition somewhere. I
> have been starting to dig into the Cassandra code but I barely know where
to
> start looking. I realize I have not provided nearly enough information to
> easily debug the problem but PLEASE keep your eyes open for possibly racy
or
> buggy code which could cause these sorts of problems. I am willing to
> provided full Cassandra logs and a corrupt SSTable on an individual basis:
> please email me and let me know.
>
>
>
> Here is possibly relevant information and my theories on a possible root
> cause. Again, I know little about the Cassandra code base and have only
> moderate java experience so these theories may be way off base.
>
> -          Strictly speaking, I probably don’t have enough memory for my
> workload. I see stop the world gc occurring ~30/day/node, often causing
> Cassandra to hang for 30+ seconds (according to the gc logs). Could there
be
> some java bug where a full gc in the middle of writing or flushing
> (compaction/memtable flush) or doing some other disk based activity causes
> some sort of data corruption?
>
> -          Writes are usually done at ConsistencyLevel ONE with additional
> client side retry logic. Given that I often see consecutive nodes in the
> ring down, could there be some edge condition where dying at just the
right
> time causes parts of mutations/messages to be lost?
>
> -          All of the CFs which have been causing me problems have large
> rows which are compacted incrementally. Could there be some problem with
the
> incremental compaction logic?
>
> -          My cluster has a fairly heavy write load (again, the most
> problematic CF is getting 1500 (w/s)/(RF=2) = 750 writes/second/node).
> Furthermore, it is highly probable that there are timestamp collisions.
> Could there be some issue with timestamp logic (ie, using > instead of >=
or
> some such) during flushes/compaction?
>
> -          Once a node
>
>
>
> Cluster/system information:
>
> -          4 nodes with RF=2
>
> -          Nodes have 8 cores with 24 GB of RAM a piece.
>
> -          2 HDs, 1 for commit log/system, 1 for /var/lib/cassandra/data
>
> -          OS is Ubuntu 10.04 (uname –r = 2.6.32-24-server)
>
> -          Java:
>
> o   java version "1.6.0_22"
>
> o   Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>
> o   Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>
> -          Slightly modified (file information in exceptions) version of
> 0.7.0
>
>
>
> The following non-standard cassandra.yaml properties have been changed:
>
> -          commitlog_sync_period_in_ms: 100 (with commitlog_sync:
periodic)
>
> -          disk_access_mode: mmap_index_only
>
> -          concurrent_reads: 12
>
> -          concurrent_writes: 2 (was 32, but I dropped it to 2 to try and
> eliminate any mutation race conditions – did not seem to help)
>
> -          sliced_buffer_size_in_kb: 128
>
> -          in_memory_compaction_limit_in_mb: 50
>
> -          rpc_timeout_in_ms: 15000
>
>
>
> Schema for most problematic CF:
>
> name: DeviceEventsByDevice
>
> column_type: Standard
>
> memtable_throughput_in_mb: 150
>
> memtable_operations_in_millions: 1.5
>
> gc_grace_seconds: 172800
>
> keys_cached: 1000000
>
> rows_cached: 0
>
>
>
> Dan Hendry
>
> (403) 660-2297
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.872 / Virus Database: 271.1.1/3432 - Release Date: 02/09/11
02:34:00


Mime
View raw message