From Peter Schuller <>
Subject Re: Repairing lost data
Date Sat, 27 Aug 2011 10:54:04 GMT
> Is it possible to use nodetool repair to fix this with the current data set?
> I issued a repair command and the other nodes seem to be doing the
> correct things but I concerned  by this: "Uncaught exception in thread
> Thread[ROW-READ-STAGE:4327,5,main]"
> Will the affect node ever be able to do anything?

Since it seems you're willing to keep the node up with the missing
data, I would remove (MOVE just to be safe) the bf+index files
corresponding to the over-written data. You definitely don't want a
bf/index files that does not match the data.

After that, a repair will propagate the missing data from other nodes.

(Implicit is that you do this with the node turned off; not just
"live" while the node is running.)

As to whether or not the exception you're seeing is expected when you
have a bf/index that is out of synch with the data file - I don't
know, and one would have to either know or look at the 0.6.6 codebase,
but it seems like a plausible error to trigger under such conditions.
But that's speaking solely based on the context and the stack trace,
not looking at the code.

But note: Removing data from a noder "under it's feet" *will* violate
consistency since the node will be missing data without "knowing" it's
missing data. So for example (but not limited to) a read at CL.ONE
that goes to that node will fail to return data, or maybe return old
data if the missing data files contained newer versions of data that
exists elsewhere in sstables on the node.

/ Peter Schuller (@scode on twitter)

