incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Coverston <ben.covers...@datastax.com>
Subject Re: Argh: Data Corruption (LOST DATA) (0.7.0)
Date Mon, 07 Feb 2011 20:52:13 GMT
Dan,

Do you have any more information on this issue? Have you been able to 
discover anything from exporing your SSTables to JSON?

Thanks,
Ben

On 1/29/11 12:45 PM, Dan Hendry wrote:
>
> I am once again having severe problems with my Cassandra cluster. This 
> time, I straight up cannot read sections of data (consistency level 
> ONE). Client side, I am seeing timeout exceptions. On the Cassandra 
> node, I am seeing errors as shown below. I don't understand what has 
> happened or how to fix it. I also don't understand how I am seeing 
> errors on only one node, using consistency level ONE with a rf=2 and 
> yet clients are failing. I have tried turning on debug logging but 
> that been no help, the logs roll over (20 mb) in < 10 seconds (the 
> cluster is being used quite heavily).
>
> My cluster has been working fine for weeks the suddenly, I had a 
> corrupt SSTable which caused me all sorts of grief (outlined in 
> pervious emails). I was able to solve the problem by turning down the 
> max compaction threshold then figuring out which SSTable was corrupt 
> by watching which minor compactions failed. After that, I straight up 
> deleted the on-disk data. Now I am having problems on a different node 
> (but adjacent in the ring) for what I am almost certain is the same 
> column family (presumably the same row/column). At this point, the 
> data is effectively lost as I know 1 of the 2 replicas was completely 
> deleted.
>
> Is there any advice going forward? My next course of action was going 
> to be exporting all of the sstables to JSON using the provided tool 
> and trying to look it over manually to see what the problem actually 
> is (if exporting will even work). I am not sure how useful this will 
> be as there is nearly 80 GB of data for this CF on a single node. What 
> is more concerning is that I have no idea how this problem initially 
> popped up. I have performed hardware tests and nothing seems to be 
> malfunctioning. Furthermore, the fact that these issues have 'jumped' 
> nodes is a strong indication to me this is a Cassandra problem.
>
> There is a Cassandra bug here somewhere, if only in the way corrupt 
> columns are dealt with.
>
> db (85098417 bytes)
>
> ERROR [ReadStage:221] 2011-01-29 12:42:39,153 
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
>
> java.lang.RuntimeException: java.io.IOException: Invalid 
> localDeleteTime read: -1516572672
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:124)
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:47)
>
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>
>         at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>
>         at 
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
>
>         at 
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
>
>         at 
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
>
>         at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
>
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>
>         at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)
>
>         at 
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
>
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)
>
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
>
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
>
>         at org.apache.cassandra.db.Table.getRow(Table.java:384)
>
>         at 
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
>
>         at 
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
>
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>         at java.lang.Thread.run(Thread.java:662)
>
> Caused by: java.io.IOException: Invalid localDeleteTime read: -1516572672
>
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:356)
>
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:180)
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:119)
>
>         ... 22 more
>
> ERROR [ReadStage:210] 2011-01-29 12:42:41,529 
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
>
> java.lang.RuntimeException: 
> org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: 
> invalid column name length 0
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:124)
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:47)
>
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>
>         at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>
>         at 
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
>
>         at 
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
>
>         at 
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
>
>         at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
>
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>
>         at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)
>
>         at 
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
>
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)
>
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
>
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
>
>         at org.apache.cassandra.db.Table.getRow(Table.java:384)
>
>         at 
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
>
>         at 
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
>
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>         at java.lang.Thread.run(Thread.java:662)
>
> Caused by: 
> org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: 
> invalid column name length 0
>
>         at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:68)
>
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364)
>
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:180)
>
>         at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:119)
>
>         ... 22 more
>
> Dan Hendry
>
> (403) 660-2297
>

Mime
View raw message