incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Repair in loop?
Date Tue, 03 Apr 2012 12:11:34 GMT
On Tue, Apr 3, 2012 at 1:55 PM, Nuno Jordao <nuno-m-jordao@telecom.pt> wrote:
> Ok, Thank you! :)
>
> One last question then, is "nodetool repair -pr" enough to recover a failed node?

It's not. It's more for doing repair of full cluster (to ensure the
all nodes are in synch), in which case you'd want to run "nodetool
repair -pr" on every node. This will however only repair one range on
each node, so for rebuilding a failed node, you'll want to stick to
"nodetool repair" on the node to recover. But then it's expected to
get RF repair sessions on said node.

--
Sylvain

>
> Nuno
>
> -----Original Message-----
> From: Sylvain Lebresne [mailto:sylvain@datastax.com]
> Sent: terça-feira, 3 de Abril de 2012 12:38
> To: user@cassandra.apache.org
> Subject: Re: Repair in loop?
> Importance: Low
>
> On Tue, Apr 3, 2012 at 12:52 PM, Nuno Jordao <nuno-m-jordao@telecom.pt> wrote:
>> Thank you for your response.
>> My question is that it is repeating the same column family:
>>
>> INFO 19:12:24,656 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf] BlockData_b6 is
fully synced (255 remaining column family to sync for this session)
>> [...]
>> INFO 10:03:50,269 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf] BlockData_b6 is
fully synced (255 remaining column family to sync for this session)
>>
>> What I was showing in my previous email is the point where it restarted:
>
> Ok, then it's likely because because those correspond to different
> ranges of the ring. Unless you've started the repair with "nodetool
> repair -pr", the repair will try to repair every range of the node and
> each repair will a different repair session. I'll admit though that
> printing which range is being repaired would have avoid that
> confusion.
>
> --
> Sylvain
>
>>
>> INFO 09:54:51,112 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf] BlockData_e8 is
fully synced (1 remaining column family to sync for this session)
>> INFO 10:03:50,269 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf] BlockData_b6 is
fully synced (255 remaining column family to sync for this session)
>>
>> Notice the "1 remaining column family to sync for this session" indication changes
to "255 remaining column family to sync for this session".
>>
>> Regards,
>>
>> Nuno Jordão
>>
>> -----Original Message-----
>> From: Sylvain Lebresne [mailto:sylvain@datastax.com]
>> Sent: terça-feira, 3 de Abril de 2012 11:36
>> To: user@cassandra.apache.org
>> Subject: Re: Repair in loop?
>> Importance: Low
>>
>> It just means that you have lots of column family and repair does 1
>> column family at a time. Each line is just saying it's done with one
>> of the column family. There is nothing wrong, but it does mean the
>> repair is *not* done yet.
>>
>> --
>> Sylvain
>>
>> On Tue, Apr 3, 2012 at 12:28 PM, Nuno Jordao <nuno-m-jordao@telecom.pt> wrote:
>>> Hello,
>>>
>>>
>>>
>>> I'm doing some test with cassandra 1.0.8 using multiple data directories
>>> with individual disks in a three node cluster (replica=3).
>>>
>>> One of the tests was to replace a couple of disks and start a repair
>>> process.
>>>
>>> It started ok and refilled the disks but I noticed that after the recovery
>>> process finished, it started a new one again:
>>>
>>>
>>>
>>> INFO 09:34:42,481 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf]
>>> BlockData_6f is fully synced (6 remaining column family to sync for this
>>> session)
>>>
>>> INFO 09:41:55,288 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf]
>>> BlockData_0d is fully synced (5 remaining column family to sync for this
>>> session)
>>>
>>> INFO 09:42:50,169 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf]
>>> BlockData_07 is fully synced (4 remaining column family to sync for this
>>> session)
>>>
>>> INFO 09:45:02,743 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf]
>>> BlockData_5a is fully synced (3 remaining column family to sync for this
>>> session)
>>>
>>> INFO 09:48:03,010 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf]
>>> BlockData_da is fully synced (2 remaining column family to sync for this
>>> session)
>>>
>>> INFO 09:54:51,112 [repair #69c95b50-7cee-11e1-0000-6b5cbd036faf]
>>> BlockData_e8 is fully synced (1 remaining column family to sync for this
>>> session)
>>>
>>> INFO 10:03:50,269 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf]
>>> BlockData_b6 is fully synced (255 remaining column family to sync for this
>>> session)
>>>
>>> INFO 10:05:42,803 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf]
>>> BlockData_13 is fully synced (254 remaining column family to sync for this
>>> session)
>>>
>>> INFO 10:08:43,354 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf]
>>> BlockData_8b is fully synced (253 remaining column family to sync for this
>>> session)
>>>
>>> INFO 10:12:09,599 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf]
>>> BlockData_31 is fully synced (252 remaining column family to sync for this
>>> session)
>>>
>>> INFO 10:15:43,426 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf]
>>> BlockData_0c is fully synced (251 remaining column family to sync for this
>>> session)
>>>
>>> INFO 10:21:47,156 [repair #a66c8240-7d6a-11e1-0000-6b5cbd036faf]
>>> BlockData_1b is fully synced (250 remaining column family to sync for this
>>> session)
>>>
>>>
>>>
>>> Is this normal? To me it doesn't make much sense.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Nuno

Mime
View raw message