incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Colby <jonathan.co...@gmail.com>
Subject Re: repair never completes with "finished successfully"
Date Tue, 12 Apr 2011 12:57:14 GMT
There is no "Repair session" message either.   It just starts with a message like:

INFO [manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723] 2011-04-10 14:00:59,051 AntiEntropyService.java
(line 770) Waiting for repair requests: [#<TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723,
/10.46.108.101, (DFS,main)>, #<TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723,
/10.47.108.100, (DFS,main)>, #<TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723,
/10.47.108.102, (DFS,main)>, #<TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723,
/10.47.108.101, (DFS,main)>]

NETSTATS:

Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         150846
Responses                       n/a         0         443183

One node in our cluster still has "unreadable rows", where the reads trip up every time for
certain sstables (you've probably seen my earlier threads regarding that).   My suspicion
is that the bloom filter read on the node with the corrupt sstables is never reporting back
to the repair, thereby causing it to hang.


What would be great is a scrub tool that ignores unreadable/unserializable rows!  : )
 

On Apr 12, 2011, at 2:15 PM, aaron morton wrote:

> Do you see a message starting "Repair session " and ending with "completed successfully"
?
> 
> Or do you see any streaming activity using "nodetool netstats"
> 
> Repair can hang if a neighbour dies and fails to send a requested stream. It will timeout
after 24 hours (I think). 
> 
> Aaron
> 
> On 12 Apr 2011, at 23:39, Karl Hiramoto wrote:
> 
>> On 12/04/2011 13:31, Jonathan Colby wrote:
>>> There are a few other threads related to problems with the nodetool repair in
0.7.4.  However I'm not seeing any errors, just never getting a message that the repair completed
successfully.
>>> 
>>> In my production and test cluster (with just a few MB data)  the repair nodetool
prompt never returns and the last entry in the cassandra.log is always something like:
>>> 
>>> #<TreeRequest manual-repair-f739ca7a-bef8-4683-b249-09105f6719d9, /10.46.108.102,
(DFS,main)>  completed successfully: 1 outstanding
>>> 
>>> But I don't see a message, even hours later, that the 1 outstanding request "finished
successfully".
>>> 
>>> Anyone else experience this?  These are physical server nodes in local data centers
and not EC2
>>> 
>> 
>> I've seen this.   To fix it  try a "nodetool compact" then repair.
>> 
>> 
>> --
>> Karl
> 


Mime
View raw message