cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6230) Write hints to flat files instead of the system.hints
Date Wed, 01 Jul 2015 11:08:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609942#comment-14609942
] 

Benedict commented on CASSANDRA-6230:
-------------------------------------

First pass of review (for patches of this size and importance, I prefer to do two or three
passes, so I get better familiarized with it, and tend to work towards the finer details in
later rounds).

I've pushed some minor comments/suggestions [here|https://github.com/belliottsmith/cassandra/tree/6230-suggestions].


Overall, it's an excellent patch. The main issues are nomenclature, in particular "unloading"
as probably much more intuitively named "delivery" (after all, it hasn't arrived yet, so unloading
is premature...)

There are also a couple of behavioural considerations, that we should establish and codify:

* On writing hints to a file, if there is an error we will never retry to write the hint,
and it will be lost. This seems both acceptable but also undesirable, especially for transient
errors.
* On reading, if there is corruption we stop reading the whole file. As with CL, we may prefer
to try to skip over the problematic hint.

There are also a couple of other minor suggestions/nits not worth polluting JIRA with.

I would like to see at least a simple burn test introduced for hints writing (and replay)
since, although it seems correctly written to me from a concurrency point-of-view, it is an
important piece of infrastructure that will have, at periods, a lot of concurrent access.

For future work, I have been mulling the idea of merging the Commit Log and Hints Log directly
together, with the obsolescence of a record simply being made a little more complex. The idea
would be to log each record against multiple targets, and have a parallel obsolescence log
(which could be a simple bitmap, near enough, although with range support for the common case
of invalidating the majority of records). Since we typically route traffic to an owning node,
this would make hints near zero-overhead, and all we would need to do - if hints turn out
to need retention - is periodically filter the log files to their minimal set of entries.


This would: permit us to have almost unlimited hints in-flight without hurting the server;
prevent hints thrashing the commit log disk (assuming they're stored there); permit hints
to have zero resource overhead during load spikes; and eliminate multiple serializations from
the write path.

> Write hints to flat files instead of the system.hints
> -----------------------------------------------------
>
>                 Key: CASSANDRA-6230
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6230
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Aleksey Yeschenko
>             Fix For: 3.0 beta 1
>
>
> Writing to a file would have less overhead on both hint creation and replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message