incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Data Corruption Problem with cassandra 0.4.0
Date Wed, 30 Sep 2009 20:25:44 GMT

Some more details from logs per IRC discussions (I've edited down to just
entries for the one CF, several are corrupt).

Older references to troublesome CF

 INFO [main] 2009-09-15 18:32:14,784 RecoveryManager.java (line 57) Replaying /var/lib/cassandra/commitlog/CommitLog-1252520035020.log,
/var/lib/cassandra/commitlog/CommitLog-1252521410192.log, /var/lib/cassandra/commitlog/CommitLog-1252522551113.log
 INFO [main] 2009-09-15 18:32:21,684 Memtable.java (line 186) Flushing Memtable(at)@218311925
 INFO [main] 2009-09-15 18:32:22,585 Memtable.java (line 220) Completed flushing Memtable(at)@218311925
 INFO [main] 2009-09-15 18:32:30,467 Memtable.java (line 186) Flushing Memtable(at)@218311925
 INFO [main] 2009-09-15 18:32:31,007 Memtable.java (line 220) Completed flushing Memtable(at)@218311925
 INFO [main] 2009-09-15 18:32:42,815 Memtable.java (line 186) Flushing Memtable(at)@218311925
 INFO [main] 2009-09-15 18:32:43,613 Memtable.java (line 220) Completed flushing Memtable(at)@218311925
 INFO [main] 2009-09-15 18:36:24,866 RecoveryManager.java (line 57) Replaying /var/lib/cassandra/commitlog/CommitLog-1253039564240.log
 INFO [main] 2009-09-15 18:36:24,926 Memtable.java (line 186) Flushing Memtable(at)@2118052890
 INFO [MINOR-COMPACTION-POOL:1] 2009-09-15 18:36:25,111 ColumnFamilyStore.java (line 1013)
Compacting [/var/lib/cassandra/data/xt/at-1-Data.db,/var/lib/cassandra/data/xt/at-2-Data.db,/var/lib/cassandra/data/xt/at-3-Data.db,/var/lib/cassandra/data/xt/at-4-Data.db]
 INFO [main] 2009-09-15 18:36:25,291 Memtable.java (line 220) Completed flushing Memtable(at)@2118052890
 INFO [main] 2009-09-15 18:36:24,866 RecoveryManager.java (line 57) Replaying /var/lib/cassandra/commitlog/CommitLog-1253039564240.log
 INFO [main] 2009-09-15 18:36:24,926 Memtable.java (line 186) Flushing Memtable(at)@2118052890
 INFO [MINOR-COMPACTION-POOL:1] 2009-09-15 18:36:25,111 ColumnFamilyStore.java (line 1013)
Compacting [/var/lib/cassandra/data/xt/at-1-Data.db,/var/lib/cassandra/data/xt/at-2-Data.db,/var/lib/cassandra/data/xt/at-3-Data.db,/var/lib/cassandra/data/xt/at-4-Data.db]
 INFO [main] 2009-09-15 18:36:25,291 Memtable.java (line 220) Completed flushing Memtable(at)@2118052890
 INFO [main] 2009-09-15 18:36:25,526 SystemTable.java (line 118) Saved Token found: 21467257700115401363439435209518481790
 INFO [main] 2009-09-29 22:59:37,886 Memtable.java (line 186) Flushing Memtable(at)@725519388
 INFO [main] 2009-09-29 22:59:38,454 Memtable.java (line 220) Completed flushing Memtable(at)@725519388
 INFO [main] 2009-09-29 22:59:40,651 Memtable.java (line 186) Flushing Memtable(at)@725519388
 INFO [main] 2009-09-29 22:59:40,950 Memtable.java (line 220) Completed flushing Memtable(at)@725519388
 INFO [main] 2009-09-29 22:59:53,669 Memtable.java (line 186) Flushing Memtable(at)@725519388
 INFO [main] 2009-09-29 22:59:53,969 Memtable.java (line 220) Completed flushing Memtable(at)@725519388

Restart that caused the problems here

 INFO [main] 2009-09-29 22:59:55,641 CassandraDaemon.java (line 142) Cassandra starting up...
 INFO [MINOR-COMPACTION-POOL:1] 2009-09-29 22:59:59,678 ColumnFamilyStore.java (line 1013)
Compacting [/var/lib/cassandra/data/xt/at-5-Data.db,/var/lib/cassandra/data/xt/at-6-Data.db,/var/lib/cassandra/data/xt/at-7-Data.db,/var/lib/cassandra/data/xt/at-8-Data.db]
 INFO [MINOR-COMPACTION-POOL:1] 2009-09-29 23:00:00,840 ColumnFamilyStore.java (line 1155)
Compacted to /var/lib/cassandra/data/xt/at-9-Data.db.  0/476199 bytes for 3010/2221 keys read/written.
 Time: 1162ms.
 ERROR [pool-1-thread-124] 2009-09-29 23:00:49,888 Cassandra.java (line 713) Internal error
processing multiget_slice java.lang.RuntimeException: java.io.EOFException
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:103)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:180)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:249)
        at org.apache.cassandra.service.CassandraServer.multiget_slice(CassandraServer.java:226)
        at org.apache.cassandra.service.Cassandra$Processor$multiget_slice.process(Cassandra.java:709)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.ja
va:609) 
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException        
        at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
        at org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95)
        at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118)
        at org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56)
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325)
        at org.apache.cassandra.db.Table.getRow(Table.java:590)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
        at org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.
java:518)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.j
ava:310)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:99)
        ... 9 more
 INFO [main] 2009-09-29 23:29:26,476 Memtable.java (line 186) Flushing Memtable(at)@688788755
 INFO [main] 2009-09-29 23:29:26,800 Memtable.java (line 220) Completed flushing Memtable(at)@688788755
 INFO [main] 2009-09-30 17:43:00,338 Memtable.java (line 186) Flushing Memtable(at)@1903646637
 INFO [main] 2009-09-30 17:43:03,066 Memtable.java (line 220) Completed flushing Memtable(at)@1903646637

Turned on DEBUG logging

 DEBUG [main] 2009-09-30 17:50:16,408 Table.java (line 101) adding at as 13
 DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD TIME for /var/lib/cassandra/data/xt/at-9-Data.db:
0 ms.
 DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD TIME for /var/lib/cassandra/data/xt/at-10-Data.db:
0 ms.
 DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD TIME for /var/lib/cassandra/data/xt/at-11-Data.db:
0 ms.
 ERROR [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:17,588 DebuggableThreadPoolExecutor.java
(line 103) Error in executor futuretask
java.util.concurrent.ExecutionException: java.io.EOFException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logFutureExceptions(DebuggableThreadPoolExecutor.java:95)
        at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableScheduledThreadPoolExecutor.java:50)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
        at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
        at org.apache.cassandra.io.IndexHelper.skipIndex(IndexHelper.java:77)
        at org.apache.cassandra.io.IndexHelper.skipBloomFilterAndIndex(IndexHelper.java:46)
        at org.apache.cassandra.io.IteratingRow.<init>(IteratingRow.java:47)
        at org.apache.cassandra.io.FileStruct.advance(FileStruct.java:124)
        at org.apache.cassandra.db.ColumnFamilyStore.initializePriorityQueue(ColumnFamilyStore.java:623)
        at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:1030)
        at org.apache.cassandra.db.ColumnFamilyStore.doCompaction(ColumnFamilyStore.java:689)
        at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:165)
        at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:162)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        ... 2 more
 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5151
 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying mutation for xt.992:
{ColumnFamily(at [score,])}
 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5291
 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying mutation for xt.992:
{ColumnFamily(at [score,])}
 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5291
 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying mutation for xt.992:
{ColumnFamily(at [score,])}
 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5291
 INFO [main] 2009-09-30 17:50:18,188 Memtable.java (line 186) Flushing Memtable(at)@1881395339
 DEBUG [main] 2009-09-30 17:50:18,388 ColumnFamilyStore.java (line 609) Submitting at for
compaction
 INFO [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 ColumnFamilyStore.java (line 1013)
Compacting [/var/lib/cassandra/data/xt/at-9-Data.db,/var/lib/cassandra/data/xt/at-10-Data.db,/var/lib/cassandra/data/xt/at-11-Data.db,/var/lib/cassandra/data/xt/at-12-Data.db]
 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index
size for bloom filter calc for file  : /var/lib/cassandra/data/xt/at-9-Data.db   : 2432
 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index
size for bloom filter calc for file  : /var/lib/cassandra/data/xt/at-10-Data.db   : 2688
 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index
size for bloom filter calc for file  : /var/lib/cassandra/data/xt/at-11-Data.db   : 5120
 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index
size for bloom filter calc for file  : /var/lib/cassandra/data/xt/at-12-Data.db   : 5376
 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 ColumnFamilyStore.java (line 1048)
Expected bloom filter size : 5376
 INFO [main] 2009-09-30 17:50:18,447 Memtable.java (line 220) Completed flushing Memtable(at)@1881395339
 DEBUG [pool-1-thread-1] 2009-09-30 17:50:30,037 StorageProxy.java (line 468) weakreadlocal
reading SliceFromReadCommand(table='xt', key='992', column_parent='QueryPath(columnFamilyName='at',
superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=100)
ERROR [pool-1-thread-1] 2009-09-30 17:50:30,037 Cassandra.java (line 679) Internal error processing
get_slice
java.lang.RuntimeException: java.io.EOFException
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:104)
        at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
        at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
        at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
        at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) 
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
        at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
        at org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95)
        at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118)
        at org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56)
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325)
        at org.apache.cassandra.db.Table.getRow(Table.java:590)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
        at org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:471)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:309)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        ... 9 more

Hope that helps narrow down the issue.

-Anthony


On Wed, Sep 30, 2009 at 11:46:47AM -0700, Anthony Molinaro wrote:
> More data, I decided to force a flush on the table I know was corrupt and
> got this error
> 
> ERROR [MINOR-COMPACTION-POOL:1] 2009-09-30 18:42:01,010 DebuggableThreadPoolExec
> utor.java (line 103) Error in executor futuretaskjava.util.concurrent.ExecutionException:
java.io.EOFException
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)       
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logFutur
> eExceptions(DebuggableThreadPoolExecutor.java:95)
>         at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor
> .afterExecute(DebuggableScheduledThreadPoolExecutor.java:50)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
>         at java.lang.Thread.run(Thread.java:619)Caused by: java.io.EOFException
>         at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)        at org.apache.cassandra.io.IndexHelper.skipIndex(IndexHelper.java:77)
>         at org.apache.cassandra.io.IndexHelper.skipBloomFilterAndIndex(IndexHelper.java:46)
>         at org.apache.cassandra.io.IteratingRow.<init>(IteratingRow.java:47)  
        at org.apache.cassandra.io.FileStruct.advance(FileStruct.java:124)
>         at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFami
> lyStore.java:1108)
>         at org.apache.cassandra.db.ColumnFamilyStore.doCompaction(ColumnFamilyStore.java:689)
>         at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:165)
>         at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:162)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         ... 2 more
> 
> -Anthony
> 
> On Wed, Sep 30, 2009 at 11:24:15AM -0700, Anthony Molinaro wrote:
> > Hi,
> > 
> > I'm not getting any responses on IRC, so figured I'd put this out on
> > the mailing list.
> > 
> > I had a 3 node cassandra cluster, replication factor 3 on
> > 3 ec2 m1.large instances behind an haproxy.  I restarted one
> > of the node to test out some modified sysctl's (tcp stack tuning).
> > As soon as I restarted it the other 2 nodes started spiking memory
> > use and the first node seemed to have corrupted data.  The corruption
> > is an exception when I read some and only some keys.
> > 
> > The exception is
> > 
> > ERROR [pool-1-thread-1] 2009-09-30 17:50:30,037 Cassandra.java (line 679) Internal
error processing get_slice
> > java.lang.RuntimeException: java.io.EOFException
> >         at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:104)
> >         at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
> >         at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
> >         at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
> >         at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
> >         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)

> >         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >         at java.lang.Thread.run(Thread.java:619)
> > Caused by: java.io.EOFException
> >         at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> >         at org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95)
> >         at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118)
> >         at org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56)
> >         at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64)
> >         at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390)
> >         at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325)
> >         at org.apache.cassandra.db.Table.getRow(Table.java:590)
> >         at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
> >         at org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:471)
> >         at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:309)
> >         at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
> >         ... 9 more
> > 
> > 
> > I ended up having to fire up some new instances, and reload the data
> > (luckily this is my small instance which I can reload quickly, I've got a 
> > large cassandra cluster currently loading which I will not be 
> > able to do this with, so I'm a little scared about that cluster).
> > 
> > Anyway, any ideas?  I've left the broken cluster so I can investigate/patch/etc.
> > 
> > -Anthony
> > 
> > -- 
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <anthonym@alumni.caltech.edu>
> 
> -- 
> ------------------------------------------------------------------------
> Anthony Molinaro                           <anthonym@alumni.caltech.edu>

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message