cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Maizel <j...@soundcloud.com>
Subject Repairing lost data
Date Sat, 27 Aug 2011 10:40:05 GMT
Hello,

In a cluster running 0.6.6 one node lost part of a data file due to an
operator error.  An older file was moved in place to bring cassandra
up again.

Now we get lots of these in the log:

 2011-08-27_10:30:55.26219 'ERROR [ROW-READ-STAGE:4327] 10:30:55,258
CassandraDaemon.java:87 Uncaught exception in thread
Thread[ROW-READ-STAGE:4327,5,main]
2011-08-27_10:30:55.26219 'java.lang.ArrayIndexOutOfBoundsException
2011-08-27_10:30:55.26220 	at
org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326)
2011-08-27_10:30:55.26220 	at
java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
2011-08-27_10:30:55.26221 	at
java.io.DataInputStream.readUTF(DataInputStream.java:592)
2011-08-27_10:30:55.26221 	at
java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
2011-08-27_10:30:55.26222 	at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:125)
2011-08-27_10:30:55.26222 	at
org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:59)
2011-08-27_10:30:55.26223 	at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
2011-08-27_10:30:55.26223 	at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990)
2011-08-27_10:30:55.26224 	at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901)
2011-08-27_10:30:55.26224 	at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870)
2011-08-27_10:30:55.26224 	at
org.apache.cassandra.db.Table.getRow(Table.java:382)
2011-08-27_10:30:55.26225 	at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
2011-08-27_10:30:55.26225 	at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70)
2011-08-27_10:30:55.26226 	at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49)
2011-08-27_10:30:55.26226 	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
2011-08-27_10:30:55.26227 	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
2011-08-27_10:30:55.26227 	at java.lang.Thread.run(Thread.java:662)

Is it possible to use nodetool repair to fix this with the current data set?

I issued a repair command and the other nodes seem to be doing the
correct things but I concerned  by this: "Uncaught exception in thread
Thread[ROW-READ-STAGE:4327,5,main]"

Will the affect node ever be able to do anything?

 Also, only Data file was affected, the index and Filter files are
still the originals.  Should I keep these or do anything else with
them?

My alternative is to delete all the data and run repair again which I
have done in the past and it works but takes a while with a large data
set.

I am open to ideas and any suggestions are welcome.

-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: jake@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE

Mime
View raw message