cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4554) Log when a node is down longer than the hint window and we stop saving hints
Date Wed, 02 Jan 2013 17:02:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542234#comment-13542234
] 

Jonathan Ellis commented on CASSANDRA-4554:
-------------------------------------------

All I intended here was to store a flag (in the peers table?) or a count (would need to be
a separate CF) for when we've skipped a hint because a replica was down longer than max_hint_window.
 If we really want to get fancy we can make this a replicated CF, i.e., not in the local-only
system KS.  (A system_replicated KS keeps looking useful; tracing data could go there too.)

Extending this to "does X need a repair" is substantially more complex (CASSANDRA-2405) so
I don't consider that in scope here.

Exposing other hint metrics is also a separate problem -- I note that the JMX call for counting
hints is O(n) and may even OOM.  Let's take that to a separate ticket as well.

P.S. I'm not a fan of switching hints-in-progress to a Cache, since that implies it's okay
to throw away entries because they can be rebuilt.  This is not the case.
                
> Log when a node is down longer than the hint window and we stop saving hints
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4554
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4554
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2.1
>
>         Attachments: 0001-CASSANDRA-4554-add-hint-metrics.patch, 0001-CASSANDRA-4554-logging-to-system-table-v2.patch,
0002-CASSANDRA-4554-logging-to-system-table.patch
>
>
> We know that we need to repair whenever we lose a node or disk permanently (since it
may have had undelivered hints on it), but without exposing this we don't know when nodes
stop saving hints for a temporarily dead node, unless we're paying very close attention to
external monitoring.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message