cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Ng <buzzt...@gmail.com>
Subject Re: Repair Process Taking too long
Date Thu, 12 Apr 2012 14:06:04 GMT
I also noticed that if I use the -pr option, the repair process went down
from 30 hours to 9 hours.  Is the -pr option safe to use if I want to run
repair processes in parallel on nodes that are not replication peers?

thanks

On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng <berrytemk@gmail.com> wrote:

> Thank you for confirming that the per node data size is most likely
> causing the long repair process.  I have tried a repair on smaller column
> families and it was significantly faster.
>
> On Wed, Apr 11, 2012 at 9:55 PM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> If you have 1TB of data it will take a long time to repair. Every bit of
>> data has to be read and a hash generated. This is one of the reasons we
>> often suggest that around 300 to 400Gb per node is a good load in the
>> general case.
>>
>> Look at nodetool compactionstats .Is there a validation compaction
>> running ? If so it is still building the merkle  hash tree.
>>
>> Look at nodetool netstats . Is it streaming data ? If so all hash trees
>> have been calculated.
>>
>> Cheers
>>
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 12/04/2012, at 2:16 AM, Frank Ng wrote:
>>
>> Can you expand further on your issue? Were you using Random Patitioner?
>>
>> thanks
>>
>> On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach <leimy2k@gmail.com>wrote:
>>
>>> I had this happen when I had really poorly generated tokens for the
>>> ring.  Cassandra seems to accept numbers that are too big.  You get hot
>>> spots when you think you should be balanced and repair never ends (I think
>>> there is a 48 hour timeout).
>>>
>>>
>>> On Tuesday, April 10, 2012, Frank Ng wrote:
>>>
>>>> I am not using tier-sized compaction.
>>>>
>>>>
>>>> On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone <rhone@tinyco.com>wrote:
>>>>
>>>>> Data size, number of nodes, RF?
>>>>>
>>>>> Are you using size-tiered compaction on any of the column families
>>>>> that hold a lot of your data?
>>>>>
>>>>> Do your cassandra logs say you are streaming a lot of ranges?
>>>>> zgrep -E "(Performing streaming repair|out of sync)"
>>>>>
>>>>>
>>>>> On Tue, Apr 10, 2012 at 9:45 AM, Igor <igor@4friends.od.ua> wrote:
>>>>>
>>>>>>  On 04/10/2012 07:16 PM, Frank Ng wrote:
>>>>>>
>>>>>> Short answer - yes.
>>>>>> But you are asking wrong question.
>>>>>>
>>>>>>
>>>>>> I think both processes are taking a while.  When it starts up,
>>>>>> netstats and compactionstats show nothing.  Anyone out there successfully
>>>>>> using ext3 and their repair processes are faster than this?
>>>>>>
>>>>>>  On Tue, Apr 10, 2012 at 10:42 AM, Igor <igor@4friends.od.ua>
wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> You can check with nodetool  which part of repair process is
slow -
>>>>>>> network streams or verify compactions. use nodetool netstats
or
>>>>>>> compactionstats.
>>>>>>>
>>>>>>>
>>>>>>> On 04/10/2012 05:16 PM, Frank Ng wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am on Cassandra 1.0.7.  My repair processes are taking
over 30
>>>>>>>> hours to complete.  Is it normal for the repair process to
take this long?
>>>>>>>>  I wonder if it's because I am using the ext3 file system.
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jonathan Rhone
>>>>> Software Engineer
>>>>>
>>>>> *TinyCo*
>>>>> 800 Market St., Fl 6
>>>>> San Francisco, CA 94102
>>>>> www.tinyco.com
>>>>>
>>>>>
>>>>
>>
>>
>

Mime
View raw message