Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <CALwotiFEi_8iwXgs-niA0VHc6QGpU-3d=LMnTxnTAJ6VFiNWgw@mail.gmail.com>
References: <CALwotiHchuC4jijn2b4L-ggosrG3Oq+Nfs-qkvYyY4Ubv0aYcQ@mail.gmail.com>
 <CAHkQdMiSzGgPXvSuu1sNmkjoK8=13rvZqjX7dE6SeBz3rkBDnA@mail.gmail.com>
 <CADQ6LYkUT=3R4FcaWaBnQ+N51m9P0fwtFOstPSPO+-OAS0Lgmw@mail.gmail.com> <CALwotiFEi_8iwXgs-niA0VHc6QGpU-3d=LMnTxnTAJ6VFiNWgw@mail.gmail.com>
From: Bhuvan Rawal <bhu1rawal@gmail.com>
Date: Wed, 4 Jan 2017 10:39:04 +0530
Message-ID: <CADQ6LY=y83Sh7CweaYCefx-nxx-fvTwstrppGO_MHorzj6G_Uw@mail.gmail.com>
Subject: Re: Reaper repair seems to "hang"
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e011829b2a4d1ed05453dc849
archived-at: Wed, 04 Jan 2017 05:09:33 -0000

--089e011829b2a4d1ed05453dc849
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Daniel,

Looks like yours is a different case. If you're running incremental repair
for the first time it make take long time esp. if table is large. And
repair may seem to stuck even when things are working.

You can try nodetool compactionstats when repair appears stuck, you'll find
a validation compaction happening if that's indeed the case.

For the first incremental repair you can follow this doc, in further
repairs incremental repair should encounter very few sstables:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNo=
desMigration.html

Regards,
Bhuvan


On Jan 4, 2017 3:52 AM, "Daniel Kleviansky" <daniel@kleviansky.com> wrote:

Hi Bhuvan,

Thank you so very much for your detailed reply.
Just to ensure everyone is across the same information, and responses are
not duplicated across two different forums, I thought I'd share with the
mailing list that I've created a GitHub issue at: https://github.com/
thelastpickle/cassandra-reaper/issues/39

Kind regards,
Daniel

On Wed, Jan 4, 2017 at 6:31 AM, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:

> Hi Daniel,
>
> We faced a similar issue during repair with reaper. We ran repair with
> more repair threads than number of cassandra nodes. But on and off repair
> was getting stuck and we had to do rolling restart of cluster or wait for
> lock time to expire (~1hr).
>
> We had a look at the stuck repair, threadpools were getting stuck at
> AntiEntropy stage. From the synchronized block in repair code it appeared
> that per node at max 1 concurrent repair session per node is possible.
>
> According to https://medium.com/@mlowicki/cassandra-reaper-introductio
> n-ed73410492bf#.f0erygqpk :
>
> Segment runner has protection mechanism to avoid overloading nodes using
> two simple rules to postpone repair if:
>
> 1. Number of pending compactions is greater than *MAX_PENDING_COMPACTIONS=
*
>  (20 by default)
> *2. Node is already running repair job*
>
> We tried running reaper with number of threads less than number of nodes
> (assuming reaper will not submit multiple segments to single cassandra
> node) but still it was observed that multiple repair segments were going =
to
> same node concurrently and threfore chances of nodes getting stuck in tha=
t
> state was possible. Finally we settled with single repair thread in reape=
r
> settings. Although takes a slightly more time but has completed
> successfully numerous times.
>
> Thread Dump of cassandra server when repair was getting stuck:
>
> "*AntiEntropyStage:1" #159 daemon prio=3D5 os_prio=3D0 tid=3D0x00007f0fa1=
6226a0
> nid=3D0x3c82 waiting for monitor entry [0x00007ee9eabaf000*]
>    java.lang.Thread.State: BLOCKED (*on object monitor*)
>         at org.apache.cassandra.service.ActiveRepairService.removeParen
> tRepairSession(ActiveRepairService.java:392)
>         - waiting to lock <0x000000067c083308> (a
> org.apache.cassandra.service.ActiveRepairService)
>         at org.apache.cassandra.service.ActiveRepairService.doAntiCompa
> ction(ActiveRepairService.java:417)
>         at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(
> RepairMessageVerbHandler.java:145)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeli
> veryTask.java:67)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executor
> s.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1142)
>
> Hope it helps!
>
> Regards,
> Bhuvan
>
> According to https://medium.com/@mlowicki/cassandra-reaper-introductio
> n-ed73410492bf#.f0erygqpk :
>
> Segment runner has protection mechanism to avoid overloading nodes using
> two simple rules to postpone repair if:
>
> 1. Number of pending compactions is greater than *MAX_PENDING_COMPACTIONS=
*
>  (20 by default)
> 2. Node is already running repair job
>
>
> On Tue, Jan 3, 2017 at 11:16 AM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> Hi Daniel,
>>
>> could you file a bug in the issue tracker ? https://github.com/thelastpi
>> ckle/cassandra-reaper/issues
>>
>> We'll figure out what's wrong and get your repairs running.
>>
>> Thanks !
>>
>> On Tue, Jan 3, 2017 at 12:35 AM Daniel Kleviansky <daniel@kleviansky.com=
>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Using The Last Pickle's fork of Reaper, and unfortunately running into =
a
>>> bit of an issue. I'll try break it down below.
>>>
>>> # Problem Description:
>>> * After starting repair via the GUI, progress remains at 0/x.
>>> * Cassandra nodes calculate their respective token ranges, and then
>>> nothing happens.
>>> * There were no errors in the Reaper or Cassandra logs. Only a message
>>> of acknowledgement that a repair had initiated.
>>> * Performing stack trace on the running JVM, once can see that the
>>> thread spawning the repair process was waiting on a lock that was never
>>> being released.
>>> * This occurred on all nodes, and prevented any manually initiated
>>> repair process from running. A rolling restart of each node was require=
d,
>>> after which one could run a `nodetool repair` successfully.
>>>
>>> # Cassandra Cluster Details:
>>> * Cassandra 2.2.5 running on Windows Server 2008 R2
>>> * 6 node cluster, split across 2 DCs, with RF =3D 3:3.
>>>
>>> # Reaper Details:
>>> * Reaper 0.3.3 running on Windows Server 2008 R2, utilising a PostgreSQ=
L
>>> database.
>>>
>>> ## Reaper settings:
>>> * Parallism: DC-Aware
>>> * Repair Intensity: 0.9
>>> * Incremental: true
>>>
>>> Don't want to swamp you with more details or unnecessary logs,
>>> especially as I'd have to sanitize them before sending them out, so ple=
ase
>>> let me know if there is anything else I can provide, and I'll do my bes=
t to
>>> get it to you.
>>>
>>> =E2=80=8BKind regards,
>>> Daniel
>>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>


--=20
Daniel Kleviansky
System Engineer & CX Consultant
M: +61 (0) 499 103 043 | E: daniel@kleviansky.com | W:
http://danielkleviansky.com

--089e011829b2a4d1ed05453dc849
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">Hi Daniel,<div dir=3D"auto"><br></div><div dir=3D"auto">L=
ooks like yours is a different case. If you&#39;re running incremental repa=
ir for the first time it make take long time esp. if table is large. And re=
pair may seem to stuck even when things are working.=C2=A0</div><div dir=3D=
"auto"><br></div><div dir=3D"auto">You can try nodetool compactionstats whe=
n repair appears stuck, you&#39;ll find a validation compaction happening i=
f that&#39;s indeed the case.=C2=A0</div><div dir=3D"auto"><br></div><div d=
ir=3D"auto">For the first incremental repair you can follow this doc, in fu=
rther repairs incremental repair should encounter very few sstables:</div><=
div dir=3D"auto"><a href=3D"https://docs.datastax.com/en/cassandra/2.1/cass=
andra/operations/opsRepairNodesMigration.html">https://docs.datastax.com/en=
/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html</a><br></d=
iv><div dir=3D"auto"><br></div><div dir=3D"auto">Regards,</div><div dir=3D"=
auto">Bhuvan</div><div dir=3D"auto"><br></div><br><div class=3D"gmail_extra=
" dir=3D"auto"><br><div class=3D"gmail_quote">On Jan 4, 2017 3:52 AM, &quot=
;Daniel Kleviansky&quot; &lt;<a href=3D"mailto:daniel@kleviansky.com">danie=
l@kleviansky.com</a>&gt; wrote:<br type=3D"attribution"><blockquote class=
=3D"quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:=
tahoma,sans-serif">Hi Bhuvan,</div><div class=3D"gmail_default" style=3D"fo=
nt-family:tahoma,sans-serif"><br></div><div class=3D"gmail_default" style=
=3D"font-family:tahoma,sans-serif">Thank you so very much for your detailed=
 reply.</div><div class=3D"gmail_default" style=3D"font-family:tahoma,sans-=
serif">Just to ensure everyone is across the same information, and response=
s are not duplicated across two different forums, I thought I&#39;d share w=
ith the mailing list that I&#39;ve created a GitHub issue at:=C2=A0<a href=
=3D"https://github.com/thelastpickle/cassandra-reaper/issues/39" target=3D"=
_blank">https://github.com/<wbr>thelastpickle/cassandra-<wbr>reaper/issues/=
39</a></div><div class=3D"gmail_default" style=3D"font-family:tahoma,sans-s=
erif"><br></div><div class=3D"gmail_default" style=3D"font-family:tahoma,sa=
ns-serif">Kind regards,</div><div class=3D"gmail_default" style=3D"font-fam=
ily:tahoma,sans-serif">Daniel</div><div class=3D"gmail_extra"><div class=3D=
"elided-text"><br><div class=3D"gmail_quote">On Wed, Jan 4, 2017 at 6:31 AM=
, Bhuvan Rawal <span dir=3D"ltr">&lt;<a href=3D"mailto:bhu1rawal@gmail.com"=
 target=3D"_blank">bhu1rawal@gmail.com</a>&gt;</span> wrote:<br><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex"><div dir=3D"ltr">Hi Daniel,<div><br></div><div>We faced=
 a similar issue during repair with reaper. We ran repair with more repair =
threads than number of cassandra nodes. But on and off repair was getting s=
tuck and we had to do rolling restart of cluster or wait for lock time to e=
xpire (~1hr).=C2=A0</div><div><br></div><div>We had a look at the stuck rep=
air, threadpools were getting stuck at AntiEntropy stage. From the synchron=
ized block in repair code it appeared that per node at max 1 concurrent rep=
air session per node is possible.=C2=A0</div><div><br></div><div><span styl=
e=3D"font-size:12.8px">According to=C2=A0</span><a href=3D"https://medium.c=
om/@mlowicki/cassandra-reaper-introduction-ed73410492bf#.f0erygqpk" style=
=3D"font-size:12.8px" target=3D"_blank">https://medium.com/@mlowick<wbr>i/c=
assandra-reaper-introductio<wbr>n-ed73410492bf#.f0erygqpk</a><span style=3D=
"font-size:12.8px">=C2=A0:</span></div><div><span style=3D"font-size:12.8px=
"><br></span><div style=3D"font-size:12.8px"><span style=3D"color:rgba(0,0,=
0,0.8);font-family:medium-content-serif-font,georgia,cambria,&quot;times ne=
w roman&quot;,times,serif;letter-spacing:-0.003em">Segment runner has prote=
ction mechanism to avoid overloading nodes using two simple rules to postpo=
ne=C2=A0<span class=3D"m_-5310370434740979426m_-4766757875583002751gmail-il=
">repair</span>=C2=A0if:=C2=A0</span></div><div style=3D"font-size:12.8px">=
<span style=3D"font-family:medium-content-serif-font,georgia,cambria,&quot;=
times new roman&quot;,times,serif;letter-spacing:-0.003em;color:rgba(0,0,0,=
0.8)"><br></span></div><div style=3D"font-size:12.8px"><span style=3D"font-=
family:medium-content-serif-font,georgia,cambria,&quot;times new roman&quot=
;,times,serif;letter-spacing:-0.003em;color:rgba(0,0,0,0.8)">1. Number of p=
ending compactions is greater than=C2=A0</span><em class=3D"m_-531037043474=
0979426m_-4766757875583002751gmail-m_3049142694580412350gmail-markup--em m_=
-5310370434740979426m_-4766757875583002751gmail-m_3049142694580412350gmail-=
markup--li-em" style=3D"font-family:medium-content-serif-font,georgia,cambr=
ia,&quot;times new roman&quot;,times,serif;letter-spacing:-0.003em;color:rg=
ba(0,0,0,0.8);font-feature-settings:&#39;liga&#39; 1,&#39;salt&#39; 1">MAX_=
PENDING_COMPACTIONS</em><span style=3D"font-family:medium-content-serif-fon=
t,georgia,cambria,&quot;times new roman&quot;,times,serif;letter-spacing:-0=
.003em;color:rgba(0,0,0,0.8)">=C2=A0(<wbr>20 by default)</span></div><div s=
tyle=3D"font-size:12.8px"><span style=3D"font-family:medium-content-serif-f=
ont,georgia,cambria,&quot;times new roman&quot;,times,serif;letter-spacing:=
-0.003em;color:rgba(0,0,0,0.8)"><b>2. Node is already running=C2=A0<span cl=
ass=3D"m_-5310370434740979426m_-4766757875583002751gmail-il">repair</span>=
=C2=A0job</b></span></div></div><div><br></div><div>We tried running reaper=
 with number of threads less than number of nodes (assuming reaper will not=
 submit multiple segments to single cassandra node) but still it was observ=
ed that multiple repair segments were going to same node concurrently and t=
hrefore chances of nodes getting stuck in that state was possible. Finally =
we settled with single repair thread in reaper settings. Although takes a s=
lightly more time but has completed successfully numerous times.</div><div>=
<br></div><div>Thread Dump of cassandra server when repair was getting stuc=
k:</div><div><br></div><div><div style=3D"font-size:12.8px"><font face=3D"m=
onospace, monospace">&quot;<b>AntiEntropyStage:1&quot; #159 daemon prio=3D5=
 os_prio=3D0 tid=3D0x00007f0fa16226a0 nid=3D0x3c82 waiting for monitor entr=
y [0x00007ee9eabaf000</b>]</font></div><div style=3D"font-size:12.8px"><fon=
t face=3D"monospace, monospace">=C2=A0 =C2=A0java.lang.Thread.State: BLOCKE=
D (<b>on object monitor</b>)</font></div><div style=3D"font-size:12.8px"><f=
ont face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache=
.cassandra.service.A<wbr>ctiveRepairService.removeParen<wbr>tRepairSession(=
ActiveRepairSer<wbr>vice.java:392)</font></div><div style=3D"font-size:12.8=
px"><font face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 - waiti=
ng to lock &lt;0x000000067c083308&gt; (a org.apache.cassandra.service.A<wbr=
>ctiveRepairService)</font></div><div style=3D"font-size:12.8px"><font face=
=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassand=
ra.service.A<wbr>ctiveRepairService.doAntiCompa<wbr>ction(ActiveRepairServi=
ce.java<wbr>:417)</font></div><div style=3D"font-size:12.8px"><font face=3D=
"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.=
<span class=3D"m_-5310370434740979426m_-4766757875583002751gmail-il">repair=
</span>.Re<wbr>pairMessageVerbHandler.doVerb(<wbr>RepairMessageVerbHandler.=
java:<wbr>145)</font></div><div style=3D"font-size:12.8px"><font face=3D"mo=
nospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at <a href=3D"http://org.ap=
ache.cassandra.net" target=3D"_blank">org.apache.cassandra.net</a>.Messa<wb=
r>geDeliveryTask.run(MessageDeli<wbr>veryTask.java:67)</font></div><div sty=
le=3D"font-size:12.8px"><font face=3D"monospace, monospace">=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 at java.util.concurrent.Executors<wbr>$RunnableAdapter.call(E=
xecutor<wbr>s.java:511)</font></div><div style=3D"font-size:12.8px"><font f=
ace=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concu=
rrent.FutureTas<wbr>k.run(FutureTask.java:266)</font></div><div style=3D"fo=
nt-size:12.8px"><font face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 at java.util.concurrent.ThreadPoo<wbr>lExecutor.runWorker(ThreadPool=
<wbr>Executor.java:1142)</font></div></div><div><font face=3D"monospace, mo=
nospace"><br></font></div><div>Hope it helps!</div><div><br></div><div>Rega=
rds,</div><div>Bhuvan</div><div><span style=3D"font-size:12.8px"><br></span=
></div><div><span style=3D"font-size:12.8px">According to=C2=A0</span><a hr=
ef=3D"https://medium.com/@mlowicki/cassandra-reaper-introduction-ed73410492=
bf#.f0erygqpk" style=3D"font-size:12.8px" target=3D"_blank">https://medium.=
com/@mlowick<wbr>i/cassandra-reaper-introductio<wbr>n-ed73410492bf#.f0erygq=
pk</a><span style=3D"font-size:12.8px">=C2=A0:</span></div><div style=3D"fo=
nt-size:12.8px"><span style=3D"color:rgba(0,0,0,0.8);font-family:medium-con=
tent-serif-font,georgia,cambria,&quot;times new roman&quot;,times,serif;let=
ter-spacing:-0.003em"><br></span></div><div style=3D"font-size:12.8px"><spa=
n style=3D"color:rgba(0,0,0,0.8);font-family:medium-content-serif-font,geor=
gia,cambria,&quot;times new roman&quot;,times,serif;letter-spacing:-0.003em=
">Segment runner has protection mechanism to avoid overloading nodes using =
two simple rules to postpone=C2=A0<span class=3D"m_-5310370434740979426m_-4=
766757875583002751gmail-il">repair</span>=C2=A0if:=C2=A0</span></div><div s=
tyle=3D"font-size:12.8px"><span style=3D"font-family:medium-content-serif-f=
ont,georgia,cambria,&quot;times new roman&quot;,times,serif;letter-spacing:=
-0.003em;color:rgba(0,0,0,0.8)"><br></span></div><div style=3D"font-size:12=
.8px"><span style=3D"font-family:medium-content-serif-font,georgia,cambria,=
&quot;times new roman&quot;,times,serif;letter-spacing:-0.003em;color:rgba(=
0,0,0,0.8)">1. Number of pending compactions is greater than=C2=A0</span><e=
m class=3D"m_-5310370434740979426m_-4766757875583002751gmail-m_304914269458=
0412350gmail-markup--em m_-5310370434740979426m_-4766757875583002751gmail-m=
_3049142694580412350gmail-markup--li-em" style=3D"font-family:medium-conten=
t-serif-font,georgia,cambria,&quot;times new roman&quot;,times,serif;letter=
-spacing:-0.003em;color:rgba(0,0,0,0.8);font-feature-settings:&#39;liga&#39=
; 1,&#39;salt&#39; 1">MAX_PENDING_COMPACTIONS</em><span style=3D"font-famil=
y:medium-content-serif-font,georgia,cambria,&quot;times new roman&quot;,tim=
es,serif;letter-spacing:-0.003em;color:rgba(0,0,0,0.8)">=C2=A0(<wbr>20 by d=
efault)</span></div><div style=3D"font-size:12.8px"><span style=3D"font-fam=
ily:medium-content-serif-font,georgia,cambria,&quot;times new roman&quot;,t=
imes,serif;letter-spacing:-0.003em;color:rgba(0,0,0,0.8)">2. Node is alread=
y running=C2=A0<span class=3D"m_-5310370434740979426m_-4766757875583002751g=
mail-il">repair</span>=C2=A0job</span></div><div><div class=3D"m_-531037043=
4740979426h5"><div><span style=3D"font-family:medium-content-serif-font,geo=
rgia,cambria,&quot;times new roman&quot;,times,serif;letter-spacing:-0.003e=
m;color:rgba(0,0,0,0.8)"><br></span></div><div class=3D"gmail_extra"><br><d=
iv class=3D"gmail_quote">On Tue, Jan 3, 2017 at 11:16 AM, Alexander Dejanov=
ski <span dir=3D"ltr">&lt;<a href=3D"mailto:alex@thelastpickle.com" target=
=3D"_blank">alex@thelastpickle.com</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr">Hi Daniel,<div><br></div><div>could you fi=
le a bug in the issue tracker ?=C2=A0<a href=3D"https://github.com/thelastp=
ickle/cassandra-reaper/issues" target=3D"_blank">https://github.com/thelast=
pi<wbr>ckle/cassandra-reaper/issues</a>=C2=A0</div><div><br></div><div>We&#=
39;ll figure out what&#39;s wrong and get your repairs running.</div><div><=
br></div><div>Thanks !</div></div><div class=3D"m_-5310370434740979426m_-47=
66757875583002751HOEnZb"><div class=3D"m_-5310370434740979426m_-47667578755=
83002751h5"><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Tue, Jan 3, =
2017 at 12:35 AM Daniel Kleviansky &lt;<a href=3D"mailto:daniel@kleviansky.=
com" target=3D"_blank">daniel@kleviansky.com</a>&gt; wrote:<br></div><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div dir=3D"ltr" class=3D"m_-5310370434740979426m_=
-4766757875583002751m_-3326791749356756180gmail_msg"><div><div><font face=
=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-47667578755830027=
51m_-3326791749356756180gmail_msg">Hi everyone,</font></div><div><font face=
=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-47667578755830027=
51m_-3326791749356756180gmail_msg"><br class=3D"m_-5310370434740979426m_-47=
66757875583002751m_-3326791749356756180gmail_msg"></font></div><div><font f=
ace=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-47667578755830=
02751m_-3326791749356756180gmail_msg">Using The Last Pickle&#39;s fork of R=
eaper, and unfortunately running into a bit of an issue. I&#39;ll try break=
 it down below.</font></div><div><font face=3D"tahoma, sans-serif" class=3D=
"m_-5310370434740979426m_-4766757875583002751m_-3326791749356756180gmail_ms=
g"><br class=3D"m_-5310370434740979426m_-4766757875583002751m_-332679174935=
6756180gmail_msg"></font></div><div><font face=3D"tahoma, sans-serif" class=
=3D"m_-5310370434740979426m_-4766757875583002751m_-3326791749356756180gmail=
_msg"># Problem Description:</font></div><div><font face=3D"tahoma, sans-se=
rif" class=3D"m_-5310370434740979426m_-4766757875583002751m_-33267917493567=
56180gmail_msg">* After starting repair via the GUI, progress remains at 0/=
x.</font></div><div><font face=3D"tahoma, sans-serif" class=3D"m_-531037043=
4740979426m_-4766757875583002751m_-3326791749356756180gmail_msg">* Cassandr=
a nodes calculate their respective token ranges, and then nothing happens.<=
/font></div><div><font face=3D"tahoma, sans-serif" class=3D"m_-531037043474=
0979426m_-4766757875583002751m_-3326791749356756180gmail_msg">* There were =
no errors in the Reaper or Cassandra logs. Only a message of acknowledgemen=
t that a repair had initiated.</font></div><div><font face=3D"tahoma, sans-=
serif" class=3D"m_-5310370434740979426m_-4766757875583002751m_-332679174935=
6756180gmail_msg">* Performing stack trace on the running JVM, once can see=
 that the thread spawning the repair process was waiting on a lock that was=
 never being released.</font></div><div><font face=3D"tahoma, sans-serif" c=
lass=3D"m_-5310370434740979426m_-4766757875583002751m_-3326791749356756180g=
mail_msg">* This occurred on all nodes, and prevented any manually initiate=
d repair process from running. A rolling restart of each node was required,=
 after which one could run a `nodetool repair` successfully.</font></div><d=
iv><font face=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-4766=
757875583002751m_-3326791749356756180gmail_msg"><br class=3D"m_-53103704347=
40979426m_-4766757875583002751m_-3326791749356756180gmail_msg"></font></div=
><div><font face=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-4=
766757875583002751m_-3326791749356756180gmail_msg"># Cassandra Cluster Deta=
ils:</font></div><div><font face=3D"tahoma, sans-serif" class=3D"m_-5310370=
434740979426m_-4766757875583002751m_-3326791749356756180gmail_msg">* Cassan=
dra 2.2.5 running on Windows Server 2008 R2</font></div><div><font face=3D"=
tahoma, sans-serif" class=3D"m_-5310370434740979426m_-4766757875583002751m_=
-3326791749356756180gmail_msg">* 6 node cluster, split across 2 DCs, with R=
F =3D 3:3.</font></div><div><font face=3D"tahoma, sans-serif" class=3D"m_-5=
310370434740979426m_-4766757875583002751m_-3326791749356756180gmail_msg"><b=
r class=3D"m_-5310370434740979426m_-4766757875583002751m_-33267917493567561=
80gmail_msg"></font></div><div><font face=3D"tahoma, sans-serif" class=3D"m=
_-5310370434740979426m_-4766757875583002751m_-3326791749356756180gmail_msg"=
># Reaper Details:</font></div><div><font face=3D"tahoma, sans-serif" class=
=3D"m_-5310370434740979426m_-4766757875583002751m_-3326791749356756180gmail=
_msg">* Reaper 0.3.3 running on Windows Server 2008 R2, utilising a Postgre=
SQL database.</font></div><div><font face=3D"tahoma, sans-serif" class=3D"m=
_-5310370434740979426m_-4766757875583002751m_-3326791749356756180gmail_msg"=
><br class=3D"m_-5310370434740979426m_-4766757875583002751m_-33267917493567=
56180gmail_msg"></font></div><div><font face=3D"tahoma, sans-serif" class=
=3D"m_-5310370434740979426m_-4766757875583002751m_-3326791749356756180gmail=
_msg">## Reaper settings:</font></div><div><font face=3D"tahoma, sans-serif=
" class=3D"m_-5310370434740979426m_-4766757875583002751m_-33267917493567561=
80gmail_msg">* Parallism: DC-Aware</font></div><div><font face=3D"tahoma, s=
ans-serif" class=3D"m_-5310370434740979426m_-4766757875583002751m_-33267917=
49356756180gmail_msg">* Repair Intensity: 0.9</font></div><div><font face=
=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-47667578755830027=
51m_-3326791749356756180gmail_msg">* Incremental: true</font></div><div><fo=
nt face=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-4766757875=
583002751m_-3326791749356756180gmail_msg"><br class=3D"m_-53103704347409794=
26m_-4766757875583002751m_-3326791749356756180gmail_msg"></font></div><div>=
<font face=3D"tahoma, sans-serif" class=3D"m_-5310370434740979426m_-4766757=
875583002751m_-3326791749356756180gmail_msg">Don&#39;t want to swamp you wi=
th more details or unnecessary logs, especially as I&#39;d have to sanitize=
 them before sending them out, so please let me know if there is anything e=
lse I can provide, and I&#39;ll do my best to get it to you.</font></div></=
div><div class=3D"m_-5310370434740979426m_-4766757875583002751m_-3326791749=
356756180gmail_msg"><br class=3D"m_-5310370434740979426m_-47667578755830027=
51m_-3326791749356756180gmail_msg"></div><div style=3D"font-family:tahoma,s=
ans-serif;display:inline">=E2=80=8BKind regards,</div>
<div class=3D"m_-5310370434740979426m_-4766757875583002751m_-33267917493567=
56180gmail_msg"><div style=3D"font-family:tahoma,sans-serif;display:inline"=
>Daniel</div></div></div>
</blockquote></div></div></div><span class=3D"m_-5310370434740979426m_-4766=
757875583002751HOEnZb"><font color=3D"#888888"><div dir=3D"ltr">-- <br></di=
v><div data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div style=3D"fo=
nt-family:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height=
:19.5px">-----------------</div><div style=3D"font-family:&quot;helvetica n=
eue&quot;,helvetica,arial,sans-serif;line-height:19.5px">Alexander Dejanovs=
ki</div><div style=3D"font-family:&quot;helvetica neue&quot;,helvetica,aria=
l,sans-serif;line-height:19.5px">France</div><div style=3D"font-family:&quo=
t;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:19.5px">@alex=
anderdeja</div><div style=3D"font-family:&quot;helvetica neue&quot;,helveti=
ca,arial,sans-serif;line-height:19.5px"><br></div><div style=3D"font-family=
:&quot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:19.5px">=
Consultant</div><div style=3D"font-family:&quot;helvetica neue&quot;,helvet=
ica,arial,sans-serif;line-height:19.5px">Apache Cassandra Consulting</div><=
div style=3D"font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-se=
rif;line-height:19.5px"><a href=3D"http://www.thelastpickle.com/" target=3D=
"_blank">http://www.thelastpickle.com</a></div></div></div>
</font></span></blockquote></div><br></div></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div><font color=
=3D"#888888">-- <br><div class=3D"m_-5310370434740979426gmail_signature" da=
ta-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><fo=
nt face=3D"tahoma, sans-serif">Daniel Kleviansky</font><div><font face=3D"t=
ahoma, sans-serif">System Engineer &amp; CX Consultant</font></div><div><sp=
an style=3D"font-family:tahoma,sans-serif;font-size:12.8px">M: +61 (0) 499 =
103 043 | E: <a href=3D"mailto:daniel@kleviansky.com" target=3D"_blank">dan=
iel@kleviansky.com</a> | W:=C2=A0</span><span style=3D"font-family:tahoma,s=
ans-serif;font-size:12.8px"><a href=3D"http://danielkleviansky.com" target=
=3D"_blank">http://danielkleviansky.com</a></span></div></div></div></div><=
/div>
</font></div></div>
</blockquote></div><br></div></div>

--089e011829b2a4d1ed05453dc849--