incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francisco Nogueira Calmon Sobral <fsob...@igcorp.com.br>
Subject Re: Possibly losing data with corrupted SSTables
Date Wed, 12 Feb 2014 17:20:15 GMT
Hi, Rahul.

I've removed the corrupted sstables and 'nodetool repair' ran successfully for the column
family. I'm not sure whether or not we've lost data.

Best regards,
Francisco Sobral


On Jan 30, 2014, at 3:58 PM, Rahul Menon <rahul@apigee.com> wrote:

> Yes should delete all files related to <cfname>-ib-<num>-<extension>.db
> 
> Run a repair after deletion
> 
> 
> On Thu, Jan 30, 2014 at 10:17 PM, Francisco Nogueira Calmon Sobral <fsobral@igcorp.com.br>
wrote:
> Ok. I'll try this idea with one sstable. But, should I delete all the files associated
with it? I mean, there is a difference in the number of files between the BAD sstable and
a GOOD one, as I've already shown:
> 
> BAD
> ------
> -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 Sessions-Users-ib-2516-Data.db
> -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 Sessions-Users-ib-2516-Index.db
> -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 Sessions-Users-ib-2516-Summary.db
> 
> GOOD
> ---------
> -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 Sessions-Users-ic-2933-CompressionInfo.db
> -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 Sessions-Users-ic-2933-Data.db
> -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 Sessions-Users-ic-2933-Filter.db
> -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50 Sessions-Users-ic-2933-Index.db
> -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 Sessions-Users-ic-2933-Statistics.db
> -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 Sessions-Users-ic-2933-Summary.db
> -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50 Sessions-Users-ic-2933-TOC.txt
> 
> Should I delete those 3 files? Should I run nodetool refresh after the operation?
> 
> Best regards,
> Francisco.
> 
> On Jan 30, 2014, at 2:02 PM, Rahul Menon <rahul@apigee.com> wrote:
> 
> > Looks like the sstables are corrupt. I dont believe there is a method to recover
those sstables. I would delete them and run a repair to ensure data consistency.
> >
> > Rahul
> >
> >
> > On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral <fsobral@igcorp.com.br>
wrote:
> > Hi, Rahul.
> >
> > I've run nodetool upgradesstable only in the problematic CF. It throwed the following
exception:
> >
> > Error occurred while upgrading the sstables for keyspace Sessions
> > java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than
file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038
> > 893416
> >         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> >         at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> >         at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
> >         at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
> >         at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
> >         at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
> > … …
> > Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException:
dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db
length 1038893416
> >         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:167)
> >         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:83)
> >         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69)
> >         at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
> >         at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
> >         at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
> >         at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
> >         at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
> >         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> >         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> >         at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
> >         at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> >         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> >         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
> >         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> >         at org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301)
> >         at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >         ... 3 more
> > Caused by: java.io.IOException: dataSize of 3622081913630118729 starting at 32906
would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db
length 1038893416
> >         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:123)
> >         ... 20 more
> >
> >
> > Regards,
> > Francisco
> >
> >
> > On Jan 29, 2014, at 3:38 PM, Rahul Menon <rahul@apigee.com> wrote:
> >
> > > Francisco,
> > >
> > > the sstables with *-ib-* is something that was from a previous version of c*.
The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure it has the *-ic-*
convention. You could try running a nodetool sstableupgrade which should ideally upgrade the
sstables with the *-ib-* to *-ic-*.
> > >
> > > Rahul
> > >
> > > On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral <fsobral@igcorp.com.br>
wrote:
> > > Dear experts,
> > >
> > > We are facing a annoying problem in our cluster.
> > >
> > > We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.
> > >
> > > The short story is that after moving the data from one cluster to another,
we've been unable to run 'nodetool repair'. It get stuck due to a CorruptSSTableException
in some nodes and CFs. After looking at some problematic CFs, we observed that some of them
have root permissions, instead of cassandra permissions. Also, their names are different from
the 'good' ones as we can see below:
> > >
> > > BAD
> > > ------
> > > -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 Sessions-Users-ib-2516-Data.db
> > > -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 Sessions-Users-ib-2516-Index.db
> > > -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 Sessions-Users-ib-2516-Summary.db
> > >
> > > GOOD
> > > ---------
> > > -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 Sessions-Users-ic-2933-CompressionInfo.db
> > > -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 Sessions-Users-ic-2933-Data.db
> > > -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 Sessions-Users-ic-2933-Filter.db
> > > -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50 Sessions-Users-ic-2933-Index.db
> > > -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 Sessions-Users-ic-2933-Statistics.db
> > > -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 Sessions-Users-ic-2933-Summary.db
> > > -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50 Sessions-Users-ic-2933-TOC.txt
> > >
> > >
> > > We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in
this problematic CF, but it has been running for at least two weeks (it is not frozen) and
keeps logging many WARNs while working with the above mentioned SSTable:
> > >
> > > WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java (line
57) Non-fatal error reading row (stacktrace follows)
> > > java.io.IOError: java.io.IOException: Impossible row size 3618452438597849419
> > >         at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
> > >         at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526)
> > >         at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515)
> > >         at org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70)
> > >         at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280)
> > >         at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >         at java.lang.Thread.run(Thread.java:744)
> > > Caused by: java.io.IOException: Impossible row size 3618452438597849419
> > >         ... 10 more
> > >
> > >
> > > 1) I do not think that deleting all data of one node and running 'nodetool
rebuild' will work, since we observed that this problem occurs in all nodes. So we may not
be able to restore all the data. What can be done in this case?
> > >
> > > 2) Why the permissions of some sstables are 'root'? Is this problem caused
by our manual migration of data? (see long story below)
> > >
> > >
> > > How we ran into this?
> > >
> > > The long story is that we've tried to move our cluster with sstableloader,
but it was unable to load all the data correctly. Our solution was to put ALL cluster data
into EACH new node and run 'nodetool refresh'. I performed this task for each node and each
column family sequentially. Sometimes I had to rename some sstables, because they came from
different nodes with the same name. I don't remember if I ran 'nodetool repair'  or even 'nodetool
cleanup' in each node. Apparently, the process was successful, and (almost) all the data was
moved.
> > >
> > > Unfortunately, after 3 months since we moved, I am unable to perform read operations
in some keys of some CFs. I think that some of these keys belong to the above mentioned sstables.
> > >
> > > Any insights are welcome.
> > >
> > > Best regards,
> > > Francisco Sobral
> > >
> >
> >
> 
> 


Mime
View raw message