Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of schumi.han@gmail.com
 designates 74.125.82.44 as permitted sender)
MIME-Version: 1.0
Sender: schumi.han@gmail.com
In-Reply-To: <4F8911DA.202@4friends.od.ua>
References: 
 <CAHfvOyNU39EfDVOJi7xoYKCG2FNA2xOoupoo6otvqEQotUfc8g@mail.gmail.com>
	<4F8446F0.7070906@4friends.od.ua>
	<CAHfvOyPbY=hQ8wJe--X57oHLAcWkCCDHideewJEaiyJkvrcqWQ@mail.gmail.com>
	<4F84639D.7050304@4friends.od.ua>
	<CAMQDwDZdezzUR2o0Y4rZqa=mZmHzeCiVyagV25dJ=6cnyrqHqA@mail.gmail.com>
	<CAHfvOyNUSK5VCt2X8Gjm-TVY=kuV+HQds8o_E31guMs=1-3wPg@mail.gmail.com>
	<CAJJ04x74OtKrQjDO3NBSBpzBEwUN7LbNKvyuYQWZ+mJ9OzW4ww@mail.gmail.com>
	<CAHfvOyMPbj-pE9=bRqD5pkMA6raE4+qVFyu4brNc+vs5pQn6dA@mail.gmail.com>
	<399A1DFA-D0AC-4355-B2AC-3A0DFCFB6ADE@thelastpickle.com>
	<CAHfvOyOLu8Q-LGdkzK1TnBmV=LzNGj8xsxcepB4Eh5nkV9pwTA@mail.gmail.com>
	<CAAL7ocCW07Di+EJZfxW1UY4UTwQ8NvOVHqcJB9YuBq_ycF32Fg@mail.gmail.com>
	<CAKkz8Q3nR0eGyVbxxQPQd1cUsbq7eG7mka8b6-zK+qHUzJAdHA@mail.gmail.com>
	<4F8911DA.202@4friends.od.ua>
Date: Sat, 14 Apr 2012 14:54:13 +0800
Message-ID: 
 <CAF7KpS8oxOLu57y4N=4OXzmWp5wUNAGeyqrupz+sF1Otvr7=Qw@mail.gmail.com>
Subject: Re: Repair Process Taking too long
From: Zhu Han <hanzhu@nutstore.net>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d0444ef43961a8204bd9e0e00

--f46d0444ef43961a8204bd9e0e00
Content-Type: text/plain; charset=ISO-8859-1

On Sat, Apr 14, 2012 at 1:57 PM, Igor <igor@4friends.od.ua> wrote:

> Hi!
>
> What is the difference between 'repair' and '-pr repair'? Simple repair
> touch all token ranges (for all nodes) and -pr touch only range for which
> given node responsible?
>
>
-pr only touches the primary range of the node.  If you executes -pr
against all nodes in replica groups,  then all ranges are repaired.

>
>
> On 04/12/2012 05:59 PM, Sylvain Lebresne wrote:
>
>> On Thu, Apr 12, 2012 at 4:06 PM, Frank Ng<buzztemk@gmail.com>  wrote:
>>
>>> I also noticed that if I use the -pr option, the repair process went down
>>> from 30 hours to 9 hours.  Is the -pr option safe to use if I want to run
>>> repair processes in parallel on nodes that are not replication peers?
>>>
>> There is pretty much two use case for repair:
>> 1) to rebuild a node: if say a node has lost some data due to a hard
>> drive corruption or the like and you want to to rebuild what's missing
>> 2) the periodic repairs to avoid problem with deleted data coming back
>> from the dead (basically:
>> http://wiki.apache.org/**cassandra/Operations#**
>> Frequency_of_nodetool_repair<http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair>
>> )
>>
>> In case 1) you want to run 'nodetool repair' (without -pr) against the
>> node to rebuild.
>> In case 2) (which I suspect is the case your talking now), you *want*
>> to use 'nodetool repair -pr' on *every* node of the cluster. I.e.
>> that's the most efficient way to do it. The only reason not to use -pr
>> in this case would be that it's not available because you're using an
>> old version of Cassandra. And yes, it's is safe to run with -pr in
>> parallel on nodes that are not replication peers.
>>
>> --
>> Sylvain
>>
>>
>>  thanks
>>>
>>>
>>> On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng<berrytemk@gmail.com>  wrote:
>>>
>>>> Thank you for confirming that the per node data size is most likely
>>>> causing the long repair process.  I have tried a repair on smaller
>>>> column
>>>> families and it was significantly faster.
>>>>
>>>> On Wed, Apr 11, 2012 at 9:55 PM, aaron morton<aaron@thelastpickle.com**
>>>> >
>>>> wrote:
>>>>
>>>>> If you have 1TB of data it will take a long time to repair. Every bit
>>>>> of
>>>>> data has to be read and a hash generated. This is one of the reasons we
>>>>> often suggest that around 300 to 400Gb per node is a good load in the
>>>>> general case.
>>>>>
>>>>> Look at nodetool compactionstats .Is there a validation compaction
>>>>> running ? If so it is still building the merkle  hash tree.
>>>>>
>>>>> Look at nodetool netstats . Is it streaming data ? If so all hash trees
>>>>> have been calculated.
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Developer
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On 12/04/2012, at 2:16 AM, Frank Ng wrote:
>>>>>
>>>>> Can you expand further on your issue? Were you using Random Patitioner?
>>>>>
>>>>> thanks
>>>>>
>>>>> On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach<leimy2k@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I had this happen when I had really poorly generated tokens for the
>>>>>> ring.  Cassandra seems to accept numbers that are too big.  You get
>>>>>> hot
>>>>>> spots when you think you should be balanced and repair never ends (I
>>>>>> think
>>>>>> there is a 48 hour timeout).
>>>>>>
>>>>>>
>>>>>> On Tuesday, April 10, 2012, Frank Ng wrote:
>>>>>>
>>>>>>> I am not using tier-sized compaction.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone<rhone@tinyco.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Data size, number of nodes, RF?
>>>>>>>>
>>>>>>>> Are you using size-tiered compaction on any of the column families
>>>>>>>> that hold a lot of your data?
>>>>>>>>
>>>>>>>> Do your cassandra logs say you are streaming a lot of ranges?
>>>>>>>> zgrep -E "(Performing streaming repair|out of sync)"
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 10, 2012 at 9:45 AM, Igor<igor@4friends.od.ua>  wrote:
>>>>>>>>
>>>>>>>>> On 04/10/2012 07:16 PM, Frank Ng wrote:
>>>>>>>>>
>>>>>>>>> Short answer - yes.
>>>>>>>>> But you are asking wrong question.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think both processes are taking a while.  When it starts up,
>>>>>>>>> netstats and compactionstats show nothing.  Anyone out there
>>>>>>>>> successfully
>>>>>>>>> using ext3 and their repair processes are faster than this?
>>>>>>>>>
>>>>>>>>> On Tue, Apr 10, 2012 at 10:42 AM, Igor<igor@4friends.od.ua>
>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> You can check with nodetool  which part of repair process is slow
>>>>>>>>>> -
>>>>>>>>>> network streams or verify compactions. use nodetool netstats or
>>>>>>>>>> compactionstats.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 04/10/2012 05:16 PM, Frank Ng wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am on Cassandra 1.0.7.  My repair processes are taking over 30
>>>>>>>>>>> hours to complete.  Is it normal for the repair process to take
>>>>>>>>>>> this long?
>>>>>>>>>>>  I wonder if it's because I am using the ext3 file system.
>>>>>>>>>>>
>>>>>>>>>>> thanks
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jonathan Rhone
>>>>>>>> Software Engineer
>>>>>>>>
>>>>>>>> TinyCo
>>>>>>>> 800 Market St., Fl 6
>>>>>>>> San Francisco, CA 94102
>>>>>>>> www.tinyco.com
>>>>>>>>
>>>>>>>>
>>>>>
>

--f46d0444ef43961a8204bd9e0e00
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><div class=3D"gmail_quote">On Sat, Apr 14, 2012 at 1:57 PM, Igor <span =
dir=3D"ltr">&lt;<a href=3D"mailto:igor@4friends.od.ua">igor@4friends.od.ua<=
/a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:=
0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi!<br>
<br>
What is the difference between &#39;repair&#39; and &#39;-pr repair&#39;? S=
imple repair touch all token ranges (for all nodes) and -pr touch only rang=
e for which given node responsible?<div class=3D"HOEnZb"><div class=3D"h5">
<br></div></div></blockquote><div><br>-pr only touches the primary range of=
 the node.=A0 If you executes -pr against all nodes in replica groups,=A0 t=
hen all ranges are repaired.<br></div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);paddi=
ng-left:1ex">
<div class=3D"HOEnZb"><div class=3D"h5">
<br>
<br>
On 04/12/2012 05:59 PM, Sylvain Lebresne wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
On Thu, Apr 12, 2012 at 4:06 PM, Frank Ng&lt;<a href=3D"mailto:buzztemk@gma=
il.com" target=3D"_blank">buzztemk@gmail.com</a>&gt; =A0wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
I also noticed that if I use the -pr option, the repair process went down<b=
r>
from 30 hours to 9 hours. =A0Is the -pr option safe to use if I want to run=
<br>
repair processes in parallel on nodes that are not replication peers?<br>
</blockquote>
There is pretty much two use case for repair:<br>
1) to rebuild a node: if say a node has lost some data due to a hard<br>
drive corruption or the like and you want to to rebuild what&#39;s missing<=
br>
2) the periodic repairs to avoid problem with deleted data coming back<br>
from the dead (basically:<br>
<a href=3D"http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetoo=
l_repair" target=3D"_blank">http://wiki.apache.org/<u></u>cassandra/Operati=
ons#<u></u>Frequency_of_nodetool_repair</a>)<br>
<br>
In case 1) you want to run &#39;nodetool repair&#39; (without -pr) against =
the<br>
node to rebuild.<br>
In case 2) (which I suspect is the case your talking now), you *want*<br>
to use &#39;nodetool repair -pr&#39; on *every* node of the cluster. I.e.<b=
r>
that&#39;s the most efficient way to do it. The only reason not to use -pr<=
br>
in this case would be that it&#39;s not available because you&#39;re using =
an<br>
old version of Cassandra. And yes, it&#39;s is safe to run with -pr in<br>
parallel on nodes that are not replication peers.<br>
<br>
--<br>
Sylvain<br>
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
thanks<br>
<br>
<br>
On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng&lt;<a href=3D"mailto:berrytemk@g=
mail.com" target=3D"_blank">berrytemk@gmail.com</a>&gt; =A0wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Thank you for confirming that the per node data size is most likely<br>
causing the long repair process. =A0I have tried a repair on smaller column=
<br>
families and it was significantly faster.<br>
<br>
On Wed, Apr 11, 2012 at 9:55 PM, aaron morton&lt;<a href=3D"mailto:aaron@th=
elastpickle.com" target=3D"_blank">aaron@thelastpickle.com</a><u></u>&gt;<b=
r>
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
If you have 1TB of data it will take a long time to repair. Every bit of<br=
>
data has to be read and a hash generated. This is one of the reasons we<br>
often suggest that around 300 to 400Gb per node is a good load in the<br>
general case.<br>
<br>
Look at nodetool compactionstats .Is there a validation compaction<br>
running ? If so it is still building the merkle =A0hash tree.<br>
<br>
Look at nodetool netstats . Is it streaming data ? If so all hash trees<br>
have been calculated.<br>
<br>
Cheers<br>
<br>
<br>
-----------------<br>
Aaron Morton<br>
Freelance Developer<br>
@aaronmorton<br>
<a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.thela=
stpickle.com</a><br>
<br>
On 12/04/2012, at 2:16 AM, Frank Ng wrote:<br>
<br>
Can you expand further on your issue? Were you using Random Patitioner?<br>
<br>
thanks<br>
<br>
On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach&lt;<a href=3D"mailto:leimy2=
k@gmail.com" target=3D"_blank">leimy2k@gmail.com</a>&gt;<br>
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
I had this happen when I had really poorly generated tokens for the<br>
ring. =A0Cassandra seems to accept numbers that are too big. =A0You get hot=
<br>
spots when you think you should be balanced and repair never ends (I think<=
br>
there is a 48 hour timeout).<br>
<br>
<br>
On Tuesday, April 10, 2012, Frank Ng wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
I am not using tier-sized compaction.<br>
<br>
<br>
On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone&lt;<a href=3D"mailto:rhone=
@tinyco.com" target=3D"_blank">rhone@tinyco.com</a>&gt;<br>
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Data size, number of nodes, RF?<br>
<br>
Are you using size-tiered compaction on any of the column families<br>
that hold a lot of your data?<br>
<br>
Do your cassandra logs say you are streaming a lot of ranges?<br>
zgrep -E &quot;(Performing streaming repair|out of sync)&quot;<br>
<br>
<br>
On Tue, Apr 10, 2012 at 9:45 AM, Igor&lt;<a href=3D"mailto:igor@4friends.od=
.ua" target=3D"_blank">igor@4friends.od.ua</a>&gt; =A0wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
On 04/10/2012 07:16 PM, Frank Ng wrote:<br>
<br>
Short answer - yes.<br>
But you are asking wrong question.<br>
<br>
<br>
I think both processes are taking a while. =A0When it starts up,<br>
netstats and compactionstats show nothing. =A0Anyone out there successfully=
<br>
using ext3 and their repair processes are faster than this?<br>
<br>
On Tue, Apr 10, 2012 at 10:42 AM, Igor&lt;<a href=3D"mailto:igor@4friends.o=
d.ua" target=3D"_blank">igor@4friends.od.ua</a>&gt; =A0wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Hi<br>
<br>
You can check with nodetool =A0which part of repair process is slow -<br>
network streams or verify compactions. use nodetool netstats or<br>
compactionstats.<br>
<br>
<br>
On 04/10/2012 05:16 PM, Frank Ng wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Hello,<br>
<br>
I am on Cassandra 1.0.7. =A0My repair processes are taking over 30<br>
hours to complete. =A0Is it normal for the repair process to take this long=
?<br>
 =A0I wonder if it&#39;s because I am using the ext3 file system.<br>
<br>
thanks<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
--<br>
Jonathan Rhone<br>
Software Engineer<br>
<br>
TinyCo<br>
800 Market St., Fl 6<br>
San Francisco, CA 94102<br>
<a href=3D"http://www.tinyco.com" target=3D"_blank">www.tinyco.com</a><br>
<br>
</blockquote></blockquote></blockquote>
<br>
</blockquote></blockquote></blockquote></blockquote>
<br>
</div></div></blockquote></div><br>

--f46d0444ef43961a8204bd9e0e00--