incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamar Rosen <ta...@correlor.com>
Subject Re: Repair hangs - Cassandra 1.2.10
Date Wed, 04 Dec 2013 09:24:02 GMT
Update - I am still experiencing the above issues, but not all the time. I
was able to run repair (on this keyspace) from node 2 and from node 4, but
now a different keyspace hangs on these nodes, and I am still not able to
run repair on node 1. It seems random. I changed logging to debug level,
but still nothing is logged.
Again - any help will be appreciated.

Tamar


On Mon, Dec 2, 2013 at 11:53 AM, Tamar Rosen <tamar@correlor.com> wrote:

> Hi,
>
> On AWS, we had a 2 node cluster with RF 2.
> We added 2 more nodes, then changed RF to 3 on all our keyspaces.
> Next step was to run nodetool repair, node by node.
> (In the meantime, we found that we must use  CL quorum, which is affecting
> our application's performance).
> Started with node 1, which is one of the old nodes.
> Ran:
> nodetool repair -pr
>
> It seemed to be progressing fine, running keyspace by keyspace, for about
> an hour, but then it hung. The last messages in the output are:
>
> [2013-12-01 11:18:24,577] Repair command #4 finished
> [2013-12-01 11:18:24,594] Starting repair command #5, repairing 230 ranges
> for keyspace correlor_customer_766
>
> It stayed like this for almost 24 hours. Then we read about the
> possibility of this being related to not upgrading sstables<http://comments.gmane.org/gmane.comp.db.cassandra.user/31939>,
> so we killed the process. We were not sure whether we had run upgrade
> sstables (we upgraded from 1.2.4 a couple of months ago)
>
> So:
> Ran upgradesstables on a specific table in the keyspace that repair got
> stuck on. (this was fast)
> nodetool upgradesstables correlor_customer_766 users
> Ran repair on that same table.
> nodetool repair correlor_customer_766 users -pr
>
> This is again hanging.
> The first and only output from this process is:
> [2013-12-02 08:22:41,221] Starting repair command #6, repairing 230 ranges
> for keyspace correlor_customer_766
>
> Nothing else happened for more than an hour.
>
> Any help and advice will be greatly appreciated.
>
> Tamar Rosen
>
> correlor.com
>
>
>
>
>
>

Mime
View raw message