cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li, Guangxing" <guangxing...@pearson.com>
Subject Re: Nodetool repair
Date Wed, 21 Sep 2016 14:44:21 GMT
Alain,

my script actually grep through all the log files, including those
system.log.*. So it was probably due to a failed session. So now my script
assumes the repair has finished (possibly due to failure) if it does not
see any more repair related logs after 2 hours.

Thanks.

George.

On Wed, Sep 21, 2016 at 3:03 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> Hi George,
>
> That's the best way to monitor repairs "out of the box" I could think of.
> When you're not seeing 2048 (in your case), it might be due to log rotation
> or to a session failure. Have you had a look at repair failures?
>
> I am wondering why the implementor did not put something in the log (e.g.
>> ... Repair command #41 has ended...) to clearly state that the repair has
>> completed.
>
>
> +1, and some informations about ranges successfully repaired and the
> ranges that failed could be a very good thing as well. It would be easy to
> then read the repair result and to know what to do next (re-run repair on
> some ranges, move to the next node, etc).
>
>
> 2016-09-20 17:00 GMT+02:00 Li, Guangxing <guangxing.li@pearson.com>:
>
>> Hi,
>>
>> I am using version 2.0.9. I have been looking into the logs to see if a
>> repair is finished. Each time a repair is started on a node, I am seeing
>> log line like "INFO [Thread-112920] 2016-09-16 19:00:43,805
>> StorageService.java (line 2646) Starting repair command #41, repairing 2048
>> ranges for keyspace groupmanager" in system.log. So I know that I am
>> expecting to see 2048 log lines like "INFO [AntiEntropySessions:109]
>> 2016-09-16 19:27:20,662 RepairSession.java (line 282) [repair
>> #8b910950-7c43-11e6-88f3-f147ea74230b] session completed successfully".
>> Once I see 2048 such log lines, I know this repair has completed. But this
>> is not dependable since sometimes I am seeing less than 2048 but I know
>> there is no repair going on since I do not see any trace of repair in
>> system.log for a long time. So it seems to me that there is a clear way to
>> tell that a repair has started but there is no clear way to tell a repair
>> has ended. The only thing you can do is to watch the log and if you do not
>> see repair activity for a long time, the repair is done somehow. I am
>> wondering why the implementor did not put something in the log (e.g. ...
>> Repair command #41 has ended...) to clearly state that the repair has
>> completed.
>>
>> Thanks.
>>
>> George.
>>
>> On Tue, Sep 20, 2016 at 2:54 AM, Jens Rantil <jens.rantil@tink.se> wrote:
>>
>>> On Mon, Sep 19, 2016 at 3:07 PM Alain RODRIGUEZ <arodrime@gmail.com>
>>> wrote:
>>>
>>> ...
>>>
>>>> - The size of your data
>>>> - The number of vnodes
>>>> - The compaction throughput
>>>> - The streaming throughput
>>>> - The hardware available
>>>> - The load of the cluster
>>>> - ...
>>>>
>>>
>>> I've also heard that the number of clustering keys per partition key
>>> could have an impact. Might be worth investigating.
>>>
>>> Cheers,
>>> Jens
>>> --
>>>
>>> Jens Rantil
>>> Backend Developer @ Tink
>>>
>>> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
>>> For urgent matters you can reach me at +46-708-84 18 32.
>>>
>>
>>
>

Mime
View raw message