cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-11218) Prioritize Secondary Index rebuild
Date Mon, 19 Sep 2016 05:05:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502288#comment-15502288
] 

Jeff Jirsa edited comment on CASSANDRA-11218 at 9/19/16 5:04 AM:
-----------------------------------------------------------------

I have a version of this patch I'll be submitting very soon, but while I wait for internal
approvals, I'd like to describe the implementation so that those of you who care about this
can provide feedback conceptually before I submit a patch for review.

I'm implementing this as a priority queue that uses a custom comparator implemented with three
tiers:

* Operation type priority (to allow certain types - like index rebuild - to run at higher
priorities, and others - scrub / cleanup / verify - to run at much lower priorities). This
is defined as an int field in the enum in the OperationType, and can be overridden via system
property. Lot of opportunity for bike shedding here in picking exact priorities - I've chosen
(highest priority to lowest):

** Anticompaction / Index Summary Redistribution
** Index Build / View Build
** Key Cache Save / Row Cache Save / Counter Cache Save
** User Defined Compaction
** Compaction (including maximal/major compaction)
** Tombstone Compaction
** Scrub / Cleanup / Upgrade SSTables
** Verify

* Sub type priority (to allow compaction tasks within a type to have preference - to enable
behavior like CASSANDRA-6288 ). This is defined as a long, and set by the compaction strategies,
and by default, I'm setting this as the bytes on disk of the source sstables - larger transactions
(at the time the task was created) preferred over smaller transactions. 

* Timestamp priority, where tasks with the same type/subtype values are served FIFO.

The implementation here was pretty straight forward - we create a new interface to expose
the three priority values, and then extend AbstractCompactionTask and de-anonymize the handful
of anonymous runnables/wrapped runnables/callables to implement that interface so they can
be sorted in the PriorityBlockingQueue. 

There may an opportunity to try to get clever to protect against starvation in under-resourced
systems, such as increasing type priority over time as tasks age, but I'm leaving that as
a potential optimization for the future - I'm not sure it's really needed, it makes reasoning
about compaction harder, but maybe there exists a use case where it's necessary. 

Expecting to submit the patch early this week - if either of you (Sankalp / Marcus) finds
this approach conflicts with your expectations, or if you want to volunteer to review, let
me know.


was (Author: jjirsa):
I have a version of this patch I'll be submitting very soon, but while I wait for internal
approvals, I'd like to describe the implementation so that those of you who care about this
can provide feedback conceptually before I submit a patch for review.

I'm implementing this as a priority queue that uses a custom comparator implemented with three
tiers:

* Operation type priority (to allow certain types - like index rebuild - to run at higher
priorities, and others - scrub / cleanup / verify - to run at much lower priorities). This
is defined as an int field in the enum in the OperationType, and can be overridden via system
property. Lot of opportunity for bike shedding here in picking exact priorities - I've chosen
(highest priority to lowest):

** Anticompaction
** Index Build / View Build
** Key Cache Save / Row Cache Save / Counter Cache Save
** User Defined Compaction
** Compaction (including maximal/major compaction)
** Tombstone Compaction
** Scrub / Cleanup / Upgrade SSTables
** Index Summary Redistribution
** Verify

* Sub type priority (to allow compaction tasks within a type to have preference - to enable
behavior like CASSANDRA-6288 ). This is defined as a long, and set by the compaction strategies,
and by default, I'm setting this as the bytes on disk of the source sstables - larger transactions
(at the time the task was created) preferred over smaller transactions. 

* Timestamp priority, where tasks with the same type/subtype values are served FIFO.

The implementation here was pretty straight forward - we create a new interface to expose
the three priority values, and then extend AbstractCompactionTask and de-anonymize the handful
of anonymous runnables/wrapped runnables/callables to implement that interface so they can
be sorted in the PriorityBlockingQueue. 

There may an opportunity to try to get clever to protect against starvation in under-resourced
systems, such as increasing type priority over time as tasks age, but I'm leaving that as
a potential optimization for the future - I'm not sure it's really needed, it makes reasoning
about compaction harder, but maybe there exists a use case where it's necessary. 

Expecting to submit the patch early this week - if either of you (Sankalp / Marcus) finds
this approach conflicts with your expectations, or if you want to volunteer to review, let
me know.

> Prioritize Secondary Index rebuild
> ----------------------------------
>
>                 Key: CASSANDRA-11218
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11218
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Jeff Jirsa
>            Priority: Minor
>
> We have seen that secondary index rebuild get stuck behind other compaction during a
bootstrap and other operations. This causes things to not finish. We should prioritize index
rebuild via a separate thread pool or using a priority queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message