cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Questions about RackAwareStrategy and Multiple Data Centers
Date Sat, 20 Nov 2010 05:47:44 GMT
On Fri, Nov 19, 2010 at 5:30 AM, Jake Maizel <> wrote:
> We see that after starting repair on one node, we get lots of GC
> (However, we are not swapping and disk io seems fine).  We also see
> increases in the pending queue for AE stages (Seems normal, on the
> order of 40-80 pending stages).  What doesn't seem normal is that we
> see large increase in the AE pending queue on all other nodes not
> running repair (I would expect this on neighbors, but not all nodes)
> and it seems to take forever for these queues to drain (Forever = over
> 24 hrs).

Sounds like
(Fixed for 0.6.9.)

> Here are some questions I have (I can provide any additional info required):
> 1. If a node we run repair on finishes, indicated by compaction and AE
> being 0, but the next node we want to repair still has non-zero queues
> for C and AE, can we still start up the repair?

I think having AE empty is the important one, but I'd wait for
everything to be quiesced to be safe.

> 2. What is the effect of running repair on more than one node at a
> time under 0.6.6?  I realize its not recommended but I accidentally
> did this and am curious of the effect.

Often the repairs will stomp on each others' internal state and
neither will finish.

> 3. Is large GC activity normal during a repair outside the documented
> "GC Storm" cases?

Yes.  Repair does a lot of object allocation.

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

View raw message