Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
References: <CAJiupW2RB1LcKVxT7Y4N74uWFtVKmD4a1Rodfp1SnZZiwGr+TQ@mail.gmail.com>
 <CAHkQdMjO9nUNSpoDf7B-cu_NGKLcULZbfaAdx_qE-5QMSqkNdA@mail.gmail.com>
 <CAJiupW1BzJ7zSG8yr+=b2ecPDsd+NmgUJuSh=-DSU4LTCRFGzg@mail.gmail.com>
 <CAHkQdMigwPb-9cKogbijewzjk9Ny=6EqFboE20KwTJ7DYsySNQ@mail.gmail.com>
 <CAJiupW0EOe6tdxVVF+2PW=LRT2AJ2Zmmqu7jFcn5tCse=7v7MQ@mail.gmail.com>
 <CAHkQdMgXWp-QfmS5NCkJZZY5YvYNF=JivCBvxx0Z0+LVu8B6sA@mail.gmail.com> <CAJiupW1ctYRMjz6YE5eXucO-dULiftAsm0-__iSVNztgPuO_cA@mail.gmail.com>
In-Reply-To: <CAJiupW1ctYRMjz6YE5eXucO-dULiftAsm0-__iSVNztgPuO_cA@mail.gmail.com>
From: Alexander Dejanovski <alex@thelastpickle.com>
Date: Wed, 28 Sep 2016 15:46:07 +0000
Message-ID: <CAHkQdMjz2qT5su_hDZ=h7nwGJByw7+Mi3E_o-kyyoRxzO-UGCQ@mail.gmail.com>
Subject: Re: How to get rid of "Cannot start multiple repair sessions over the
 same sstables" exception
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a11422cae0b2990053d934359
archived-at: Wed, 28 Sep 2016 15:46:31 -0000

--001a11422cae0b2990053d934359
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Robert,

You can restart them in any order, that doesn't make a difference afaik.

Cheers

Le mer. 28 sept. 2016 17:10, Robert Sicoie <robert.sicoie@gmail.com> a
=C3=A9crit :

> Thanks Alexander,
>
> Yes, with tpstats I can see the hanging active repair(s) (output
> attached). For one there are 31 pending repair. On others there are less
> pending repairs (min 12). Is there any recomandation for the restart orde=
r?
> The one with more less pending repairs first, perhaps?
>
> Thanks,
> Robert
>
> Robert Sicoie
>
> On Wed, Sep 28, 2016 at 5:35 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> They will show up in nodetool compactionstats :
>> https://issues.apache.org/jira/browse/CASSANDRA-9098
>>
>> Did you check nodetool tpstats to see if you didn't have any running
>> repair session ?
>> Just to make sure (and if you can actually do it), roll restart the
>> cluster and try again. Repair sessions can get sticky sometimes.
>>
>> On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie <robert.sicoie@gmail.com>
>> wrote:
>>
>>> I am using nodetool compactionstats to check for pending compactions an=
d
>>> it shows me 0 pending on all nodes, seconds before running nodetool rep=
air.
>>> I am also monitoring PendingCompactions on jmx.
>>>
>>> Is there other way I can find out if is there any anticompaction runnin=
g
>>> on any node?
>>>
>>> Thanks a lot,
>>> Robert
>>>
>>> Robert Sicoie
>>>
>>> On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>>> Robert,
>>>>
>>>> you need to make sure you have no repair session currently running on
>>>> your cluster, and no anticompaction.
>>>> I'd recommend doing a rolling restart in order to stop all running
>>>> repair for sure, then start the process again, node by node, checking =
that
>>>> no anticompaction is running before moving from one node to the other.
>>>>
>>>> Please do not use the -pr switch as it is both useless (token ranges
>>>> are repaired only once with inc repair, whatever the replication facto=
r)
>>>> and harmful as all anticompactions won't be executed (you'll still hav=
e
>>>> sstables marked as unrepaired even if the process has ran entirely wit=
h no
>>>> error).
>>>>
>>>> Let us know how that goes.
>>>>
>>>> Cheers,
>>>>
>>>> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie <robert.sicoie@gmail.com=
>
>>>> wrote:
>>>>
>>>>> Thanks Alexander,
>>>>>
>>>>> Now I started to run the repair with -pr arg and with keyspace and
>>>>> table args.
>>>>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
>>>>> RepairRunnable.java:246 - Repair session
>>>>> 89af4d10-856f-11e6-b28f-df99132d7979 for range
>>>>> [(8323429577695061526,8326640819362122791],
>>>>> ..., (4212695343340915405,4229348077081465596]]] Validation failed in=
 /
>>>>> 10.45.113.88"
>>>>>
>>>>> for one of the tables. 10.45.113.88 is the ip of the machine I am
>>>>> running the nodetool on.
>>>>> I'm wondering if this is normal...
>>>>>
>>>>> Thanks,
>>>>> Robert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Robert Sicoie
>>>>>
>>>>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
>>>>> alex@thelastpickle.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> nodetool scrub won't help here, as what you're experiencing is most
>>>>>> likely that one SSTable is going through anticompaction, and then an=
other
>>>>>> node is asking for a Merkle tree that involves it.
>>>>>> For understandable reasons, an SSTable cannot be anticompacted and
>>>>>> validation compacted at the same time.
>>>>>>
>>>>>> The solution here is to adjust the repair pressure on your cluster s=
o
>>>>>> that anticompaction can end before you run repair on another node.
>>>>>> You may have a lot of anticompaction to do if you had high volumes o=
f
>>>>>> unrepaired data, which can take a long time depending on several fac=
tors.
>>>>>>
>>>>>> You can tune your repair process to make sure no anticompaction is
>>>>>> running before launching a new session on another node or you can tr=
y my
>>>>>> Reaper fork that handles incremental repair :
>>>>>> https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-supp=
ort-with-ui
>>>>>> I may have to add a few checks in order to avoid all collisions
>>>>>> between anticompactions and new sessions, but it should be helpful i=
f you
>>>>>> struggle with incremental repair.
>>>>>>
>>>>>> In any case, check if your nodes are still anticompacting before
>>>>>> trying to run a new repair session on a node.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <
>>>>>> robert.sicoie@gmail.com> wrote:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I have a cluster of 5 nodes, cassandra 3.0.5.
>>>>>>> I was running nodetool repair last days, one node at a time, when I
>>>>>>> first encountered this exception
>>>>>>>
>>>>>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
>>>>>>> CassandraDaemon.java:195 - Exception in thread
>>>>>>> Thread[ValidationExecutor:11,1,main]*
>>>>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions
>>>>>>> over the same sstables*
>>>>>>> * at
>>>>>>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToV=
alidate(CompactionManager.java:1194)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCo=
mpaction(CompactionManager.java:1084)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> org.apache.cassandra.db.compaction.CompactionManager.access$700(Com=
pactionManager.java:80)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> org.apache.cassandra.db.compaction.CompactionManager$10.call(Compac=
tionManager.java:714)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto=
r.java:1142)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut=
or.java:617)
>>>>>>> [na:1.8.0_60]*
>>>>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>>>>>>>
>>>>>>> On some of the other boxes I see this:
>>>>>>>
>>>>>>>
>>>>>>> *Caused by: org.apache.cassandra.exceptions.RepairException: [repai=
r
>>>>>>> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv,
>>>>>>> [(-7505573573695693981,-7495786486761919991],*
>>>>>>> *....*
>>>>>>> * (-8483612809930827919,-8480482504800860871]]] Validation failed i=
n
>>>>>>> /10.45.113.67 <http://10.45.113.67>*
>>>>>>> * at
>>>>>>> org.apache.cassandra.repair.ValidationTask.treesReceived(Validation=
Task.java:68)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> org.apache.cassandra.repair.RepairSession.validationComplete(Repair=
Session.java:183)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> org.apache.cassandra.service.ActiveRepairService.handleMessage(Acti=
veRepairService.java:408)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairM=
essageVerbHandler.java:168)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at org.apache.cassandra.net
>>>>>>> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDe=
liveryTask.java:67)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:=
511)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto=
r.java:1142)
>>>>>>> [na:1.8.0_60]*
>>>>>>> * at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut=
or.java:617)
>>>>>>> [na:1.8.0_60]*
>>>>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>>>>>>> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096
>>>>>>> CassandraDaemon.java:195 - Exception in thread Thread[RepairJobTask=
:3,5,RMI
>>>>>>> Runtime]*
>>>>>>> *java.lang.AssertionError: java.lang.InterruptedException*
>>>>>>> * at org.apache.cassandra.net
>>>>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(Out=
boundTcpConnection.java:172)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at org.apache.cassandra.net
>>>>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(Messa=
gingService.java:761)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at org.apache.cassandra.net
>>>>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(Messa=
gingService.java:729)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:=
56)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto=
r.java:1142)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut=
or.java:617)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]*
>>>>>>> *Caused by: java.lang.InterruptedException: null*
>>>>>>> * at
>>>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterr=
uptibly(AbstractQueuedSynchronizer.java:1220)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at
>>>>>>> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(Reentran=
tLock.java:335)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at
>>>>>>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.ja=
va:339)
>>>>>>> ~[na:1.8.0_60]*
>>>>>>> * at org.apache.cassandra.net
>>>>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(Out=
boundTcpConnection.java:168)
>>>>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>>>>> * ... 6 common frames omitted*
>>>>>>>
>>>>>>>
>>>>>>> Now if I run nodetool repair I get the
>>>>>>>
>>>>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions
>>>>>>> over the same sstables*
>>>>>>>
>>>>>>> exception.
>>>>>>> What do you suggest? would nodetool scrub or sstablescrub help in
>>>>>>> this case. or it would just make it worse?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Robert
>>>>>>>
>>>>>> --
>>>>>> -----------------
>>>>>> Alexander Dejanovski
>>>>>> France
>>>>>> @alexanderdeja
>>>>>>
>>>>>> Consultant
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>
>>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

--001a11422cae0b2990053d934359
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">Robert,</p>
<p dir=3D"ltr">You can restart them in any order, that doesn&#39;t make a d=
ifference afaik.</p>
<p dir=3D"ltr">Cheers</p>
<br><div class=3D"gmail_quote"><div dir=3D"ltr">Le mer. 28 sept. 2016 17:10=
, Robert Sicoie &lt;<a href=3D"mailto:robert.sicoie@gmail.com">robert.sicoi=
e@gmail.com</a>&gt; a =C3=A9crit=C2=A0:<br></div><blockquote class=3D"gmail=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex"><div dir=3D"ltr">Thanks Alexander,<div><br></div><div>Yes, with tpstat=
s I can see the hanging active repair(s) (output attached). For one there a=
re 31 pending repair. On others there are less pending repairs (min 12). Is=
 there any recomandation for the restart order? The one with more less pend=
ing repairs first, perhaps?</div><div><br></div><div>Thanks,</div><div>Robe=
rt</div></div><div class=3D"gmail_extra"></div><div class=3D"gmail_extra"><=
br clear=3D"all"><div><div data-smartmail=3D"gmail_signature">Robert Sicoie=
</div></div></div><div class=3D"gmail_extra">
<br><div class=3D"gmail_quote">On Wed, Sep 28, 2016 at 5:35 PM, Alexander D=
ejanovski <span dir=3D"ltr">&lt;<a href=3D"mailto:alex@thelastpickle.com" t=
arget=3D"_blank">alex@thelastpickle.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex"><div dir=3D"ltr">They will show up in nodetool compact=
ionstats :=C2=A0<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-=
9098" target=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-909=
8</a><div><br></div><div>Did you check nodetool tpstats to see if you didn&=
#39;t have any running repair session ?=C2=A0</div><div>Just to make sure (=
and if you can actually do it), roll restart the cluster and try again. Rep=
air sessions can get sticky sometimes.</div></div><div><div><br><div class=
=3D"gmail_quote"><div dir=3D"ltr">On Wed, Sep 28, 2016 at 4:23 PM Robert Si=
coie &lt;<a href=3D"mailto:robert.sicoie@gmail.com" target=3D"_blank">rober=
t.sicoie@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><=
div dir=3D"ltr">I am using nodetool compactionstats to check for pending co=
mpactions and it shows me 0 pending on all nodes, seconds before running no=
detool repair.<div>I am also monitoring PendingCompactions on jmx.</div><di=
v><br></div><div>Is there other way I can find out if is there any anticomp=
action running on any node?</div><div><br></div><div>Thanks a lot,</div><di=
v>Robert</div></div><div class=3D"gmail_extra"></div><div class=3D"gmail_ex=
tra"><br clear=3D"all"><div><div data-smartmail=3D"gmail_signature">Robert =
Sicoie</div></div></div><div class=3D"gmail_extra">
<br><div class=3D"gmail_quote">On Wed, Sep 28, 2016 at 4:44 PM, Alexander D=
ejanovski <span dir=3D"ltr">&lt;<a href=3D"mailto:alex@thelastpickle.com" t=
arget=3D"_blank">alex@thelastpickle.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex"><div dir=3D"ltr">Robert,=C2=A0<div><br></div><div>you =
need to make sure you have no repair session currently running on your clus=
ter, and no anticompaction.</div><div>I&#39;d recommend doing a rolling res=
tart in order to stop all running repair for sure, then start the process a=
gain, node by node, checking that no anticompaction is running before movin=
g from one node to the other.</div><div><br></div><div>Please do not use th=
e -pr switch as it is both useless (token ranges are repaired only once wit=
h inc repair, whatever the replication factor) and harmful as all anticompa=
ctions won&#39;t be executed (you&#39;ll still have sstables marked as unre=
paired even if the process has ran entirely with no error).</div><div><br><=
/div><div>Let us know how that goes.</div><div><br></div><div>Cheers,</div>=
</div><div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Sep=
 28, 2016 at 2:57 PM Robert Sicoie &lt;<a href=3D"mailto:robert.sicoie@gmai=
l.com" target=3D"_blank">robert.sicoie@gmail.com</a>&gt; wrote:<br></div><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div dir=3D"ltr">Thanks Alexander,<div><br></d=
iv><div>Now I started to run the repair with -pr arg and with keyspace and =
table args.=C2=A0</div><div>Still, I got the &quot;ERROR [RepairJobTask:1] =
2016-09-28 11:34:38,288 RepairRunnable.java:246 - Repair session 89af4d10-8=
56f-11e6-b28f-df99132d7979 for range [(8323429577695061526,8326640819362122=
791], ...,=C2=A0(4212695343340915405,4229348077081465596]]] Validation fail=
ed in /<a href=3D"http://10.45.113.88" target=3D"_blank">10.45.113.88</a>&q=
uot;</div><div><br></div><div>for one of the tables. 10.45.113.88 is the ip=
 of the machine I am running the nodetool on.</div><div>I&#39;m wondering i=
f this is normal...</div><div><br></div><div>Thanks,</div><div>Robert</div>=
<div><br></div><div><br></div><div><br></div></div><div class=3D"gmail_extr=
a"></div><div class=3D"gmail_extra"><br clear=3D"all"><div><div data-smartm=
ail=3D"gmail_signature">Robert Sicoie</div></div></div><div class=3D"gmail_=
extra">
<br><div class=3D"gmail_quote">On Wed, Sep 28, 2016 at 11:53 AM, Alexander =
Dejanovski <span dir=3D"ltr">&lt;<a href=3D"mailto:alex@thelastpickle.com" =
target=3D"_blank">alex@thelastpickle.com</a>&gt;</span> wrote:<br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex"><div dir=3D"ltr">Hi,=C2=A0<div><br></div><div>nodetoo=
l scrub won&#39;t help here, as what you&#39;re experiencing is most likely=
 that one SSTable is going through anticompaction, and then another node is=
 asking for a Merkle tree that involves it.</div><div>For understandable re=
asons, an SSTable cannot be anticompacted and validation compacted at the s=
ame time.</div><div><br></div><div>The solution here is to adjust the repai=
r pressure on your cluster so that anticompaction can end before you run re=
pair on another node.</div><div>You may have a lot of anticompaction to do =
if you had high volumes of unrepaired data, which can take a long time depe=
nding on several factors.</div><div><br></div><div>You can tune your repair=
 process to make sure no anticompaction is running before launching a new s=
ession on another node or you can try my Reaper fork that handles increment=
al repair :=C2=A0<a href=3D"https://github.com/adejanovski/cassandra-reaper=
/tree/inc-repair-support-with-ui" target=3D"_blank">https://github.com/adej=
anovski/cassandra-reaper/tree/inc-repair-support-with-ui</a></div><div>I ma=
y have to add a few checks in order to avoid all collisions between anticom=
pactions and new sessions, but it should be helpful if you struggle with in=
cremental repair.</div><div><br></div><div>In any case, check if your nodes=
 are still anticompacting before trying to run a new repair session on a no=
de.</div><div><br></div><div>Cheers,</div><div>=C2=A0</div></div><div><div>=
<br><div class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Sep 28, 2016 at 10:=
31 AM Robert Sicoie &lt;<a href=3D"mailto:robert.sicoie@gmail.com" target=
=3D"_blank">robert.sicoie@gmail.com</a>&gt; wrote:<br></div><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr"><div>Hi guys,=C2=A0</div><div><br></div><d=
iv>I have a cluster of 5 nodes, cassandra 3.0.5.</div><div>I was running no=
detool repair last days, one node at a time, when I first encountered this =
exception</div><div><br></div><div><div><i>ERROR [ValidationExecutor:11] 20=
16-09-27 16:12:20,409 CassandraDaemon.java:195 - Exception in thread Thread=
[ValidationExecutor:11,1,main]</i></div><div><i>java.lang.RuntimeException:=
 Cannot start multiple repair sessions over the same sstables</i></div><div=
><i><span style=3D"white-space:pre-wrap">	</span>at org.apache.cassandra.db=
.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:=
1194) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><span style=3D"w=
hite-space:pre-wrap">	</span>at org.apache.cassandra.db.compaction.Compacti=
onManager.doValidationCompaction(CompactionManager.java:1084) ~[apache-cass=
andra-3.0.5.jar:3.0.5]</i></div><div><i><span style=3D"white-space:pre-wrap=
">	</span>at org.apache.cassandra.db.compaction.CompactionManager.access$70=
0(CompactionManager.java:80) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><=
div><i><span style=3D"white-space:pre-wrap">	</span>at org.apache.cassandra=
.db.compaction.CompactionManager$10.call(CompactionManager.java:714) ~[apac=
he-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><span style=3D"white-space:p=
re-wrap">	</span>at java.util.concurrent.FutureTask.run(FutureTask.java:266=
) ~[na:1.8.0_60]</i></div><div><i><span style=3D"white-space:pre-wrap">	</s=
pan>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor=
.java:1142) ~[na:1.8.0_60]</i></div><div><i><span style=3D"white-space:pre-=
wrap">	</span>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadP=
oolExecutor.java:617) [na:1.8.0_60]</i></div><div><i><span style=3D"white-s=
pace:pre-wrap">	</span>at java.lang.Thread.run(Thread.java:745) [na:1.8.0_6=
0]</i></div></div><div><br></div><div>On some of the other boxes I see this=
:</div><div><br></div><div><i>Caused by: org.apache.cassandra.exceptions.Re=
pairException: [repair #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/opera=
tor_source_mv, [(-7505573573695693981,-7495786486761919991],<br></i></div><=
div><i>....</i></div><div><div><i>=C2=A0(-8483612809930827919,-848048250480=
0860871]]] Validation failed in /<a href=3D"http://10.45.113.67" target=3D"=
_blank">10.45.113.67</a></i></div><div><i><span style=3D"white-space:pre-wr=
ap">	</span>at org.apache.cassandra.repair.ValidationTask.treesReceived(Val=
idationTask.java:68) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><=
span style=3D"white-space:pre-wrap">	</span>at org.apache.cassandra.repair.=
RepairSession.validationComplete(RepairSession.java:183) ~[apache-cassandra=
-3.0.5.jar:3.0.5]</i></div><div><i><span style=3D"white-space:pre-wrap">	</=
span>at org.apache.cassandra.service.ActiveRepairService.handleMessage(Acti=
veRepairService.java:408) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><div=
><i><span style=3D"white-space:pre-wrap">	</span>at org.apache.cassandra.re=
pair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168) ~[a=
pache-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><span style=3D"white-spac=
e:pre-wrap">	</span>at <a href=3D"http://org.apache.cassandra.net" target=
=3D"_blank">org.apache.cassandra.net</a>.MessageDeliveryTask.run(MessageDel=
iveryTask.java:67) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><sp=
an style=3D"white-space:pre-wrap">	</span>at java.util.concurrent.Executors=
$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_60]</i></div><div><i><=
span style=3D"white-space:pre-wrap">	</span>at java.util.concurrent.FutureT=
ask.run(FutureTask.java:266) ~[na:1.8.0_60]</i></div><div><i><span style=3D=
"white-space:pre-wrap">	</span>at java.util.concurrent.ThreadPoolExecutor.r=
unWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]</i></div><div><i><span=
 style=3D"white-space:pre-wrap">	</span>at java.util.concurrent.ThreadPoolE=
xecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]</i></div><div=
><i><span style=3D"white-space:pre-wrap">	</span>at java.lang.Thread.run(Th=
read.java:745) [na:1.8.0_60]</i></div><div><i>ERROR [RepairJobTask:3] 2016-=
09-26 16:39:33,096 CassandraDaemon.java:195 - Exception in thread Thread[Re=
pairJobTask:3,5,RMI Runtime]</i></div><div><i>java.lang.AssertionError: jav=
a.lang.InterruptedException</i></div><div><i><span style=3D"white-space:pre=
-wrap">	</span>at <a href=3D"http://org.apache.cassandra.net" target=3D"_bl=
ank">org.apache.cassandra.net</a>.OutboundTcpConnection.enqueue(OutboundTcp=
Connection.java:172) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><=
span style=3D"white-space:pre-wrap">	</span>at <a href=3D"http://org.apache=
.cassandra.net" target=3D"_blank">org.apache.cassandra.net</a>.MessagingSer=
vice.sendOneWay(MessagingService.java:761) ~[apache-cassandra-3.0.5.jar:3.0=
.5]</i></div><div><i><span style=3D"white-space:pre-wrap">	</span>at <a hre=
f=3D"http://org.apache.cassandra.net" target=3D"_blank">org.apache.cassandr=
a.net</a>.MessagingService.sendOneWay(MessagingService.java:729) ~[apache-c=
assandra-3.0.5.jar:3.0.5]</i></div><div><i><span style=3D"white-space:pre-w=
rap">	</span>at org.apache.cassandra.repair.ValidationTask.run(ValidationTa=
sk.java:56) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><span styl=
e=3D"white-space:pre-wrap">	</span>at java.util.concurrent.ThreadPoolExecut=
or.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_60]</i></div><div><i>=
<span style=3D"white-space:pre-wrap">	</span>at java.util.concurrent.Thread=
PoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_60]</i></di=
v><div><i><span style=3D"white-space:pre-wrap">	</span>at java.lang.Thread.=
run(Thread.java:745) ~[na:1.8.0_60]</i></div><div><i>Caused by: java.lang.I=
nterruptedException: null</i></div><div><i><span style=3D"white-space:pre-w=
rap">	</span>at java.util.concurrent.locks.AbstractQueuedSynchronizer.acqui=
reInterruptibly(AbstractQueuedSynchronizer.java:1220) ~[na:1.8.0_60]</i></d=
iv><div><i><span style=3D"white-space:pre-wrap">	</span>at java.util.concur=
rent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) ~[na:1.8=
.0_60]</i></div><div><i><span style=3D"white-space:pre-wrap">	</span>at jav=
a.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) ~[n=
a:1.8.0_60]</i></div><div><i><span style=3D"white-space:pre-wrap">	</span>a=
t <a href=3D"http://org.apache.cassandra.net" target=3D"_blank">org.apache.=
cassandra.net</a>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:=
168) ~[apache-cassandra-3.0.5.jar:3.0.5]</i></div><div><i><span style=3D"wh=
ite-space:pre-wrap">	</span>... 6 common frames omitted</i></div></div><div=
><br></div><div><br></div><div>Now if I run nodetool repair I get the=C2=A0=
</div><div><i><br></i></div><div><i>java.lang.RuntimeException: Cannot star=
t multiple repair sessions over the same sstables</i><br></div><div><br></d=
iv><div>exception.</div><div>What do you suggest? would nodetool scrub or s=
stablescrub help in this case. or it would just make it worse?</div><div><b=
r></div><div>Thanks,</div><br clear=3D"all"><div><div>Robert</div></div>
</div>
</blockquote></div></div></div><span><font color=3D"#888888"><div dir=3D"lt=
r">-- <br></div><div data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><d=
iv style=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-ser=
if;line-height:19.5px">-----------------</div><div style=3D"font-family:&qu=
ot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Alex=
ander Dejanovski</div><div style=3D"font-family:&quot;helvetica neue&quot;,=
helvetica,arial,sans-serif;line-height:19.5px">France</div><div style=3D"fo=
nt-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height=
:19.5px">@alexanderdeja</div><div style=3D"font-family:&quot;helvetica neue=
&quot;,helvetica,arial,sans-serif;line-height:19.5px"><br></div><div style=
=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-=
height:19.5px">Consultant</div><div style=3D"font-family:&quot;helvetica ne=
ue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Apache Cassandra Co=
nsulting</div><div style=3D"font-family:&quot;helvetica neue&quot;,helvetic=
a,arial,sans-serif;line-height:19.5px"><a href=3D"http://www.thelastpickle.=
com/" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div>
</font></span></blockquote></div><br></div></blockquote></div><div dir=3D"l=
tr">-- <br></div><div data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><=
div style=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-se=
rif;line-height:19.5px">-----------------</div><div style=3D"font-family:&q=
uot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Ale=
xander Dejanovski</div><div style=3D"font-family:&quot;helvetica neue&quot;=
,helvetica,arial,sans-serif;line-height:19.5px">France</div><div style=3D"f=
ont-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-heigh=
t:19.5px">@alexanderdeja</div><div style=3D"font-family:&quot;helvetica neu=
e&quot;,helvetica,arial,sans-serif;line-height:19.5px"><br></div><div style=
=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-=
height:19.5px">Consultant</div><div style=3D"font-family:&quot;helvetica ne=
ue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Apache Cassandra Co=
nsulting</div><div style=3D"font-family:&quot;helvetica neue&quot;,helvetic=
a,arial,sans-serif;line-height:19.5px"><a href=3D"http://www.thelastpickle.=
com/" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div>
</div></div></blockquote></div><br></div></blockquote></div><div dir=3D"ltr=
">-- <br></div><div data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><di=
v style=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-seri=
f;line-height:19.5px">-----------------</div><div style=3D"font-family:&quo=
t;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Alexa=
nder Dejanovski</div><div style=3D"font-family:&quot;helvetica neue&quot;,h=
elvetica,arial,sans-serif;line-height:19.5px">France</div><div style=3D"fon=
t-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:=
19.5px">@alexanderdeja</div><div style=3D"font-family:&quot;helvetica neue&=
quot;,helvetica,arial,sans-serif;line-height:19.5px"><br></div><div style=
=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-=
height:19.5px">Consultant</div><div style=3D"font-family:&quot;helvetica ne=
ue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Apache Cassandra Co=
nsulting</div><div style=3D"font-family:&quot;helvetica neue&quot;,helvetic=
a,arial,sans-serif;line-height:19.5px"><a href=3D"http://www.thelastpickle.=
com/" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div>
</div></div></blockquote></div><br></div></blockquote></div><div dir=3D"ltr=
">-- <br></div><div data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><di=
v style=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-seri=
f;line-height:19.5px">-----------------</div><div style=3D"font-family:&quo=
t;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Alexa=
nder Dejanovski</div><div style=3D"font-family:&quot;helvetica neue&quot;,h=
elvetica,arial,sans-serif;line-height:19.5px">France</div><div style=3D"fon=
t-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:=
19.5px">@alexanderdeja</div><div style=3D"font-family:&quot;helvetica neue&=
quot;,helvetica,arial,sans-serif;line-height:19.5px"><br></div><div style=
=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-=
height:19.5px">Consultant</div><div style=3D"font-family:&quot;helvetica ne=
ue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Apache Cassandra Co=
nsulting</div><div style=3D"font-family:&quot;helvetica neue&quot;,helvetic=
a,arial,sans-serif;line-height:19.5px"><a href=3D"http://www.thelastpickle.=
com/" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div>

--001a11422cae0b2990053d934359--