incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Repair hangs - Cassandra 1.2.10
Date Tue, 10 Dec 2013 07:20:30 GMT
> I changed logging to debug level, but still nothing is logged. 
> Again - any help will be appreciated. 
There is nothing at the ERROR level on any machine ?

check nodetool compactionstats to see if a validation compaction is running, the repair may
be waiting on this. 

check nodetool netstats to see if streams are being exchanged, then check the logs on those
machines. 

cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 10:24 pm, Tamar Rosen <tamar@correlor.com> wrote:

> Update - I am still experiencing the above issues, but not all the time. I was able to
run repair (on this keyspace) from node 2 and from node 4, but now a different keyspace hangs
on these nodes, and I am still not able to run repair on node 1. It seems random. I changed
logging to debug level, but still nothing is logged. 
> Again - any help will be appreciated. 
> 
> Tamar
> 
> 
> On Mon, Dec 2, 2013 at 11:53 AM, Tamar Rosen <tamar@correlor.com> wrote:
> Hi,
> 
> On AWS, we had a 2 node cluster with RF 2. 
> We added 2 more nodes, then changed RF to 3 on all our keyspaces. 
> Next step was to run nodetool repair, node by node. 
> (In the meantime, we found that we must use  CL quorum, which is affecting our application's
performance).
> Started with node 1, which is one of the old nodes.
> Ran:
> nodetool repair -pr
> 
> It seemed to be progressing fine, running keyspace by keyspace, for about an hour, but
then it hung. The last messages in the output are:
>  
> [2013-12-01 11:18:24,577] Repair command #4 finished
> [2013-12-01 11:18:24,594] Starting repair command #5, repairing 230 ranges for keyspace
correlor_customer_766
> 
> It stayed like this for almost 24 hours. Then we read about the possibility of this being
related to not upgrading sstables, so we killed the process. We were not sure whether we had
run upgrade sstables (we upgraded from 1.2.4 a couple of months ago)  
> 
> So:
> Ran upgradesstables on a specific table in the keyspace that repair got stuck on. (this
was fast)
> nodetool upgradesstables correlor_customer_766 users
> Ran repair on that same table. 
> nodetool repair correlor_customer_766 users -pr
> 
> This is again hanging. 
> The first and only output from this process is:
> [2013-12-02 08:22:41,221] Starting repair command #6, repairing 230 ranges for keyspace
correlor_customer_766
> 
> Nothing else happened for more than an hour. 
> 
> Any help and advice will be greatly appreciated.
> 
> Tamar Rosen
> 
> correlor.com
> 
> 
>   
> 
> 
> 


Mime
View raw message