cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5483) Repair tracing
Date Wed, 22 Oct 2014 19:28:36 GMT


Joshua McKenzie commented on CASSANDRA-5483:

Initial observations / thoughts:

90 days on default repair trace TTL seems long to me, especially w/incremental being new default.

- 'createWithCachedTheadpool' is misspelled -> needs an r in Thread
- Do we intend to create w/Integer.MAX_VALUE as 'size' on this method?  If so, name should
reflect it like other methods in file.

- traceParameters isn't used elsewhere so logic can be nested inside constructor where it's
called and remove unnecessary method call indirection.

- Why do we assume that if TraceType is null the type is QUERY?  We should probably be explicitly
setting that for all tracing types going forward if we plan on expanding the use of the enum

- nit: declaring queryThread in createRepairTask but aren't doing anything with it -> we
should just call the method w/out declaration if we don't intend to use it
- usage of bitwise ops to toggle si for indexing into seen array is unnecessarily complex.
 A distinct 0/1 and toggle is more clear - there's no need for inline index flipping w/bitwise
ops as it's not gaining us any performance or code clarity.
- nit: If not intending to use Object result from allSessions.get(), just remove the declaration
and call the method
- Swallowing Throwable on line 2828 w/out any logging or comment as to why is unclear.  Using
exception handling as control flow is step back from the sameThreadExecutor error handling
approach taken prior as we lose naming and intention information.  Does this gain us something
that I'm not aware of?

- Right now isDone is only called by the query thread but there's nothing in the method signature
or comments of that method documenting the doubling-every-second-run algorithm it's using.
- the implementation of isDone looks like a premature optimization.  I would prefer a more
clear implementation of 'double every second run' than:
            else if (haveWaited)
                wait = Math.min((wait ^ 1) << (wait & 1), maxWaitMillis);
                return false;
                haveWaited = true;
                    wait(wait & ~1);

Now that I'm more familiar with the changes here I'm going to need to re-read the notes on
this ticket and test out / inspect output.  The above is just from my first read-through of
the code yesterday.

> Repair tracing
> --------------
>                 Key: CASSANDRA-5483
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>             Fix For: 3.0
>         Attachments: 5483-full-trunk.txt, 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch,, 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch,
5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch, 5483-v07-08-Fix-brace-style.patch,
5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch, 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch,
5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch,
5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch,
5483-v10-17-minor-bugfixes-and-changes.patch, 5483-v10-rebased-and-squashed-471f5cc.patch,
5483-v11-01-squashed.patch, 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch,
5483-v13-608fb03-May-14-trace-formatting-changes.patch, 5483-v14-01-squashed.patch, 5483-v15-02-Hook-up-exponential-backoff-functionality.patch,
5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch,
5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 5483-v15.patch, 5483-v17-00.patch,
5483-v17-01.patch, 5483-v17.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch,
prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch,
trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch, trunk@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt,
trunk@8ebeee1-5483-v01-002-simple-repair-tracing.txt, v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch,
v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch, v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
> I think it would be nice to log repair stats and results like query tracing stores traces
to system keyspace. With it, you don't have to lookup each log file to see what was the status
and how it performed the repair you invoked. Instead, you can query the repair log with session
ID to see the state and stats of all nodes involved in that repair session.

This message was sent by Atlassian JIRA

View raw message