cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Dejanovski <a...@thelastpickle.com>
Subject Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception
Date Wed, 28 Sep 2016 08:53:47 GMT
Hi,

nodetool scrub won't help here, as what you're experiencing is most likely
that one SSTable is going through anticompaction, and then another node is
asking for a Merkle tree that involves it.
For understandable reasons, an SSTable cannot be anticompacted and
validation compacted at the same time.

The solution here is to adjust the repair pressure on your cluster so that
anticompaction can end before you run repair on another node.
You may have a lot of anticompaction to do if you had high volumes of
unrepaired data, which can take a long time depending on several factors.

You can tune your repair process to make sure no anticompaction is running
before launching a new session on another node or you can try my Reaper
fork that handles incremental repair :
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
I may have to add a few checks in order to avoid all collisions between
anticompactions and new sessions, but it should be helpful if you struggle
with incremental repair.

In any case, check if your nodes are still anticompacting before trying to
run a new repair session on a node.

Cheers,


On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <robert.sicoie@gmail.com>
wrote:

> Hi guys,
>
> I have a cluster of 5 nodes, cassandra 3.0.5.
> I was running nodetool repair last days, one node at a time, when I first
> encountered this exception
>
> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
> CassandraDaemon.java:195 - Exception in thread
> Thread[ValidationExecutor:11,1,main]*
> *java.lang.RuntimeException: Cannot start multiple repair sessions over
> the same sstables*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_60]*
> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>
> On some of the other boxes I see this:
>
>
> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair
> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv,
> [(-7505573573695693981,-7495786486761919991],*
> *....*
> * (-8483612809930827919,-8480482504800860871]]] Validation failed in
> /10.45.113.67 <http://10.45.113.67>*
> * at
> org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_60]*
> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_60]*
> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 CassandraDaemon.java:195
> - Exception in thread Thread[RepairJobTask:3,5,RMI Runtime]*
> *java.lang.AssertionError: java.lang.InterruptedException*
> * at
> org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:761)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:729)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_60]*
> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]*
> *Caused by: java.lang.InterruptedException: null*
> * at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
> ~[na:1.8.0_60]*
> * at
> org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * ... 6 common frames omitted*
>
>
> Now if I run nodetool repair I get the
>
> *java.lang.RuntimeException: Cannot start multiple repair sessions over
> the same sstables*
>
> exception.
> What do you suggest? would nodetool scrub or sstablescrub help in this
> case. or it would just make it worse?
>
> Thanks,
>
> Robert
>
-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Mime
View raw message