cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lyuben Todorov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5483) Repair tracing
Date Mon, 03 Mar 2014 17:27:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918251#comment-13918251
] 

Lyuben Todorov edited comment on CASSANDRA-5483 at 3/3/14 5:25 PM:
-------------------------------------------------------------------

Are the latest 3 patches supposed to be incrementally added onto {{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}}
and {{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}?
As in

{noformat}
1 - apply trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch
2 - apply trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch
3 - apply one of the three latest patches (v3, v4 or v5)
{noformat}

v5 Does a lot of refactoring that I think is outside the scope of this ticket (but might be
worth it's own ticket as the idea is good), so my vote is for v3, but I'm getting a NoSuchMethod
exception, can you post a branch with all the patches added onto trunk (for v3)? 

The exception: 
{noformat}
java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean, java.util.Collection,
java.util.Collection, boolean, boolean, boolean, [Ljava.lang.String;)
	at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168)
	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135)
	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
	at sun.rmi.transport.Transport$1.run(Transport.java:177)
	at sun.rmi.transport.Transport$1.run(Transport.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
{noformat}

bq. I am thinking of calling the new table something generic like system_traces.trace_logs.
I also assume, that like system_traces.events

I'd say events is pretty generic, the new table should show that the traces aren't query related
like in events. If we are going to add new tables to the trace CF it's worth thinking about
refactoring events into something more specific and adding new tables with names that carry
meaning. Another possible solution is to add a "command" field to system_traces.events where
it can allow users to retrieve data about specific events, e.g. [~jbellis] WDYT? 

{noformat}
SELECT * FROM system_traces.events;
 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR
 29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | QUERY

(2 rows)

SELECT * FROM system_traces.events WHERE command='REPAIR';

 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR

(1 rows)
{noformat}


bq. the rows in this table should expire, though perhaps not as fast as 24 hours. 

+1, repairs can take a very long time so this should be configurable with the default perhaps
being around 90 days (but should be configurable), but with incremental repairs (in 2.1) it
will end up logging a lot of data, still a better choice than users doing regular repairs
missing out on information. 

bq. One last thing I wanted to ask is about the possibility of trace log levels. What is the
minimum amount of trace log information you would find useful, the next amount, and so on?
Should it just follow the loglevel?

Trace is supposed to give as much info as possible and tends to be used for debugging problems,
e.g. slow queries or in this case, repairs taking too long, so its important to include useful
information but not spam logs with every detail. Different log levels might be useful, but
in this ticket the aim is to track progress of repairs, so logging each repair command's completion
should be sufficient.  


was (Author: lyubent):
Are the latest 3 patches supposed to be incrementally added onto {{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}}
and {{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}?
As in

{noformat}
1 - apply trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch
2 - apply trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch
3 - apply one of the three latest patches (v3, v4 or v5)
{noformat}

v5 Does a lot of refactoring that I think is outside the scope of this ticket (but might be
worth it's own ticket as the idea is good), so my vote is for v3, but I'm getting a NoSuchMethod
exception, can you post a branch with all the patches added onto trunk (for v3)? 

The exception: 
{noformat}
java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean, java.util.Collection,
java.util.Collection, boolean, boolean, boolean, [Ljava.lang.String;)
	at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168)
	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135)
	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
	at sun.rmi.transport.Transport$1.run(Transport.java:177)
	at sun.rmi.transport.Transport$1.run(Transport.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
{noformat}

bq. I am thinking of calling the new table something generic like system_traces.trace_logs.
I also assume, that like system_traces.events

I'd say events is pretty generic, the new table should show that the traces aren't query related
like in events. If we are going to add new tables to the trace CF it's worth thinking about
refactoring events into something more specific and adding new tables with names that carry
meaning. Another possible solution is to add a "command" field to system_traces.events where
it can allow users to retrieve data about specific events, e.g. [~jbellis] WDYT? 

{noformat}
SELECT * FROM system_traces.events;
 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR
 29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | QUERY

(2 rows)

SELECT * FROM system_traces.events WHERE command='REPAIR';

 session_id                           | ... | thread    | command
--------------------------------------+ ... +-----------+---------
 09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1  | REPAIR

(1 rows)
{noformat}


bq. the rows in this table should expire, though perhaps not as fast as 24 hours. 

+1, repairs can take a very long time so this should be configurable with the default perhaps
being around 30 days, but with incremental repairs (in 2.1) it will end up logging a lot of
data, still a better choice than users doing regular repairs missing out on information. 

bq. One last thing I wanted to ask is about the possibility of trace log levels. What is the
minimum amount of trace log information you would find useful, the next amount, and so on?
Should it just follow the loglevel?

Trace is supposed to give as much info as possible and tends to be used for debugging problems,
e.g. slow queries or in this case, repairs taking too long, so its important to include useful
information but not spam logs with every detail. Different log levels might be useful, but
in this ticket the aim is to track progress of repairs, so logging each repair command's completion
should be sufficient.  

> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: test-5483-system_traces-events.txt, trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch,
trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch, trunk@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt,
trunk@8ebeee1-5483-v01-002-simple-repair-tracing.txt, v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch,
v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch, v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing stores traces
to system keyspace. With it, you don't have to lookup each log file to see what was the status
and how it performed the repair you invoked. Instead, you can query the repair log with session
ID to see the state and stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message