cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Chan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-5483) Repair tracing
Date Thu, 13 Mar 2014 20:07:51 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben Chan updated CASSANDRA-5483:
--------------------------------

    Attachment: cqlsh-left-justify-text-columns.patch

Public TODO list; please comment if any should be not-TODO:
* Trace streaming and/or lack thereof (I think hooking {{Differencer#run}} and related threads
should be enough).
* Maybe exclude {{system_traces}} from repair if a repair trace is going on. There seems to
be a feedback loop triggering multiple repair commands otherwise.
* Maybe add a placeholder row with a null {{duration}} for ongoing repair sessions. Makes
it easier to find the {{session_id}} for queries. Update with the final duration at the end.
* Populate {{started_at}}, {{request}}, etc in {{system_traces.sessions}}.
* Send the {{session_id}} back to nodetool.
* Shorten/simplify trace messages.
* Verbose option; dump all traces to nodetool.

Implementation thoughts follow; please warn of potential problems.

---

Verbose option:

To send local traces back to nodetool, adding a parallel {{sendNotification}} is easy enough.
Getting the remote traces seems like it would involve monitoring updates to {{system_traces.events}}.

At first I thought triggers, but the docs say that triggers run on the coordinator node, which
is not necessarily the node you're repairing. So that leaves polling the table with heuristics
that are hopefully good enough to reduce the amount of extra work.

---

Simplify trace messages:

Skipping to the point of difference:

It looks like each sub-RepairSession has a unique session id (a timeuuid but different from
either {{session_id}} or {{event_id}}). Here is a section of the select above aligned and
simplified to increase SNR. The redacted parts are identical.
{noformat}
[repair #fedc3790-...] Received merkle tree for events from /127.0.0.1
[repair #fef40550-...] new session: will sync /127.0.0.1, /127.0.0.2 on range (3074457345618258602,-9223372036854775808]
for system_traces.[sessions, events]
[repair #fef40550-...] requesting merkle trees for sessions (to [/127.0.0.2, /127.0.0.1])
[repair #fedc3790-...] session completed successfully
[repair #fef40550-...] Sending completed merkle tree to /127.0.0.1 for system_traces/sessions
{noformat}
In the example above, you can see some overlap in the repair session traces, so the sub-session_id
(so to speak) has some use in distinguishing these. Since this sub-session_id only has to
be unique for a particular repair session, maybe it would be worth it to map each one to a
small integer?

For convenience, I attached a small, not-very-pretty patch that left-justifies columns of
type text in cqlsh (makes it easier to read the traces).

---

Trace streaming:

Is there a simple way to create a situation where a repair requires streaming? Here is what
I'm currently doing, but it doesn't work.

{noformat}
#/bin/sh
ccm create $(mktemp -u 5483-XXX) &&
ccm populate -n 3 &&
ccm updateconf --no-hinted-handoff &&
ccm start &&
ccm node1 cqlsh <<"E"
CREATE SCHEMA s1
WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

CREATE TABLE s1.users (
  user_id varchar PRIMARY KEY,
  first varchar,
  last varchar,
  age int)
WITH read_repair_chance = 0.0;

INSERT INTO s1.users (user_id, first, last, age)
  VALUES ('jsmith', 'John', 'Smith', 42);
E

ccm node1 stop &&
python - <<"E" | ccm node2 cqlsh
import random as r
fs=["John","Art","Skip","Doug","Koala"]
ls=["Jackson","Jacobs","Jefferson","Smythe"]
for (f, l) in [(f,l) for f in fs for l in ls]:
  print (
    "insert into s1.users (user_id, age, first, last) "
    "values('%s', %d, '%s', '%s');"
  ) % ((f[0]+l).lower(), r.randint(10,100), f, l)
E
ccm node2 cqlsh <<"E"
select count(*) from s1.users;
E
ccm node1 start
ccm node1 cqlsh <<"E"
select count(*) from s1.users;
E
nodetool -p $(ccm node1 show | awk -F= '/jmx_port/{print $2}') repair -tr s1
{noformat}

The problem is that despite disabling hinted handoff and setting {{read_repair_chance}} to
0, the endpoints are still reported as consistent in {{Differencer#run}}. Yet node1 is clearly
missing some rows prior to the repair, and has them at the end. Somehow the streaming repair
is getting done somewhere other than {{Differencer#run}}. Is some sort of handoff still being
done somewhere? I'm sure there is something simple, but I'm missing it.


> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: 5483-full-trunk.txt, 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch,
5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch,
5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch, 5483-v07-08-Fix-brace-style.patch,
5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch, 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch,
ccm-repair-test, cqlsh-left-justify-text-columns.patch, test-5483-system_traces-events.txt,
trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
trunk@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, trunk@8ebeee1-5483-v01-002-simple-repair-tracing.txt,
v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing stores traces
to system keyspace. With it, you don't have to lookup each log file to see what was the status
and how it performed the repair you invoked. Instead, you can query the repair log with session
ID to see the state and stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message