cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Repairing lost data
Date Sat, 27 Aug 2011 15:06:43 GMT
I'm pretty sure that was a bug fixed in a later 0.6.x release so you might be able to upgrade
and the exceptions might go away.  We run 0.6.13 with a minor mod to support data expiration
and will probably do so indefinitely since there no way to upgrade without shutting our site
down :(

-Anthony

On Aug 27, 2011, at 3:40 AM, Jake Maizel <jake@soundcloud.com> wrote:

> Hello,
> 
> In a cluster running 0.6.6 one node lost part of a data file due to an
> operator error.  An older file was moved in place to bring cassandra
> up again.
> 
> Now we get lots of these in the log:
> 
> 2011-08-27_10:30:55.26219 'ERROR [ROW-READ-STAGE:4327] 10:30:55,258
> CassandraDaemon.java:87 Uncaught exception in thread
> Thread[ROW-READ-STAGE:4327,5,main]
> 2011-08-27_10:30:55.26219 'java.lang.ArrayIndexOutOfBoundsException
> 2011-08-27_10:30:55.26220    at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326)
> 2011-08-27_10:30:55.26220    at
> java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
> 2011-08-27_10:30:55.26221    at
> java.io.DataInputStream.readUTF(DataInputStream.java:592)
> 2011-08-27_10:30:55.26221    at
> java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
> 2011-08-27_10:30:55.26222    at
> org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:125)
> 2011-08-27_10:30:55.26222    at
> org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:59)
> 2011-08-27_10:30:55.26223    at
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
> 2011-08-27_10:30:55.26223    at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990)
> 2011-08-27_10:30:55.26224    at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901)
> 2011-08-27_10:30:55.26224    at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870)
> 2011-08-27_10:30:55.26224    at
> org.apache.cassandra.db.Table.getRow(Table.java:382)
> 2011-08-27_10:30:55.26225    at
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
> 2011-08-27_10:30:55.26225    at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70)
> 2011-08-27_10:30:55.26226    at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49)
> 2011-08-27_10:30:55.26226    at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 2011-08-27_10:30:55.26227    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 2011-08-27_10:30:55.26227    at java.lang.Thread.run(Thread.java:662)
> 
> Is it possible to use nodetool repair to fix this with the current data set?
> 
> I issued a repair command and the other nodes seem to be doing the
> correct things but I concerned  by this: "Uncaught exception in thread
> Thread[ROW-READ-STAGE:4327,5,main]"
> 
> Will the affect node ever be able to do anything?
> 
> Also, only Data file was affected, the index and Filter files are
> still the originals.  Should I keep these or do anything else with
> them?
> 
> My alternative is to delete all the data and run repair again which I
> have done in the past and it works but takes a while with a large data
> set.
> 
> I am open to ideas and any suggestions are welcome.
> 
> -- 
> Jake Maizel
> Head of Network Operations
> Soundcloud
> 
> Mail & GTalk: jake@soundcloud.com
> Skype: jakecloud
> 
> Rosenthaler strasse 13, 101 19, Berlin, DE

Mime
View raw message