cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: getting status of long running repair
Date Tue, 08 May 2012 10:04:24 GMT
When you look in the logs please let me know if you see this error…
https://issues.apache.org/jira/browse/CASSANDRA-4223

I look at nodetool compactionstats (for the Merkle tree phase),  nodetool netstats for the
streaming, and this to check for streaming progress:

while true; do date; diff <(nodetool -h localhost netstats) <(sleep 5 && nodetool
-h localhost netstats); done

Or use Data Stax Ops Centre where possible http://www.datastax.com/products/opscenter

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/05/2012, at 2:15 PM, Ben Coverston wrote:

> Check the log files for warnings or errors. They may indicate why your repair failed.
> 
> On Mon, May 7, 2012 at 10:09 AM, Bill Au <bill.w.au@gmail.com> wrote:
> I restarted the nodes and then restarted the repair.  It is still hanging like before.
 Do I keep repeating until the repair actually finish?
> 
> Bill
> 
> 
> On Fri, May 4, 2012 at 2:18 PM, Rob Coli <rcoli@palominodb.com> wrote:
> On Fri, May 4, 2012 at 10:30 AM, Bill Au <bill.w.au@gmail.com> wrote:
> > I know repair may take a long time to run.  I am running repair on a node
> > with about 15 GB of data and it is taking more than 24 hours.  Is that
> > normal?  Is there any way to get status of the repair?  tpstats does show 2
> > active and 2 pending AntiEntropySessions.  But netstats and compactionstats
> > show no activity.
> 
> As indicated by various recent threads to this effect, many versions
> of cassandra (including current 1.0.x release) contain bugs which
> sometimes prevent repair from completing. The other threads suggest
> that some of these bugs result in the state you are in now, where you
> do not see anything that looks like appropriate activity.
> Unfortunately the only solution offered on these other threads is the
> one I will now offer, which is to restart the participating nodes and
> re-start the repair. I am unaware of any JIRA tickets tracking these
> bugs (which doesn't mean they don't exist, of course) so you might
> want to file one. :)
> 
> =Rob
> 
> --
> =Robert Coli
> AIM&GTALK - rcoli@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb
> 
> 
> 
> 
> -- 
> Ben Coverston
> DataStax -- The Apache Cassandra Company
> 


Mime
View raw message