cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Wadehra <anujw_2...@yahoo.co.in>
Subject Re: how to read parent_repair_history table?
Date Thu, 25 Feb 2016 17:25:29 GMT
Hi Jimmy,
We are on 2.0.x. We are planning to use JMX notifications for getting repair status. To repair
database, we call forceTableRepairPrimaryRange JMX operation from our Java client application
on each node. You can call other latest JMX methods for repair.
I would be keen in knowing the pros/cons of handling repair status via JMX notifications Vs
via database tables.
We are planning to implement it as follows:
1. Before repairing each keyspace via JMX, register two listeners: one for listening to StorageService
MBean notifications about repair status and other the connection listener for detecting connection
failures and lost JMX notifications.
2. We ensure that if 256 success session notifications are received, keyspace repair is successful.
We have 256 ranges on each node.
3.If there are connection closed notifications, we will re-register the Mbean listener and
retry repair once.
4. If there are Lost Notifications we retry the repair once before failing it.


ThanksAnuj

Sent from Yahoo Mail on Android 
 
 On Thu, 25 Feb, 2016 at 7:18 pm, Paulo Motta<pauloricardomg@gmail.com> wrote:  Hello
Jimmy,

The parent_repair_history table keeps track of start and finish information of a repair session. 
The other table repair_history keeps track of repair status as it progresses. So, you must
first query the parent_repair_history table to check if a repair started and finish, as well
as its duration, and inspect the repair_history table to troubleshoot more specific details
of a given repair session.

Answering your questions below:

> Is every invocation of nodetool repair execution will be recorded as one entry in parent_repair_history
CF regardless if it is across DC, local node repair, or other options ?
Actually two entries, one for start and one for finish.

> A repair job is done only if "finished" column contains value? and a repair job is successfully
done only if there is no value in exce ption_messages or exception_stacktrace ?

correct

> what is the purpose of successful_ranges column? do i have to check they are all matched
with requested_range to ensure a successful run?
correct

-
> Ultimately, how to find out the overall repair health/status in a given cluster?

Check if repair is being executed on all nodes within gc_grace_seconds, and tune that value
or troubleshoot problems otherwise.

> Scanning through parent_repair_history and making sure all the known keyspaces has a
good repair run in recent days?

Sounds good.

You can check https://issues.apache.org/jira/browse/CASSANDRA-5839 for more information.


2016-02-25 3:13 GMT-03:00 Jimmy Lin <y2klyf+work@gmail.com>:


hi all,
few questions regarding how to read or digest the system_distributed.parent_repair_history
CF, that I am very intereted to use to find out our repair status... 
 
-
Is every invocation of nodetool repair execution will be recorded as one entry in parent_repair_history
CF regardless if it is across DC, local node repair, or other options ?
-
A repair job is done only if "finished" column contains value? and a repair job is successfully
done only if there is no value in exce
ption_messages or exception_stacktrace ?
what is the purpose of successful_ranges column? do i have to check they are all matched with
requested_range to ensure a successful run?
-
Ultimately, how to find out the overall repair health/status in a given cluster?
Scanning through parent_repair_history and making sure all the known keyspaces has a good
repair run in recent days?
---------------
CREATE TABLE system_distributed.parent_repair_history (
    parent_id timeuuid PRIMARY KEY,
    columnfamily_names set<text>,
    exception_message text,
    exception_stacktrace text,
    finished_at timestamp,
    keyspace_name text,
    requested_ranges set<text>,
    started_at timestamp,
    successful_ranges set<text>
)


  

Mime
View raw message