cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Greaves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
Date Wed, 28 Feb 2018 03:42:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379739#comment-16379739
] 

Kurt Greaves commented on CASSANDRA-14210:
------------------------------------------

[~krummas] I've set as RTC, but if you want to get another reviewer feel free to.

[~oshulgin] that would be unrelated to this patch. This will only affect any tool where you
can specify # of jobs (cleanups, upgradesstable, scrub). That sounds like a bug though and
if you can get more info might be worth another JIRA.

> Optimize SSTables upgrade task scheduling
> -----------------------------------------
>
>                 Key: CASSANDRA-14210
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Oleksandr Shulgin
>            Assignee: Kurt Greaves
>            Priority: Major
>             Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool upgradesstables --jobs
N}}, with N > 1, not all of the provided N slots are used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  What we observed
both for version 2.2 and 3.0, is that initially all 4 provided slots are used for "Upgrade
sstables" compactions, but later when some of the 4 tasks are finished, no new tasks are scheduled
immediately.  It takes the last of the 4 tasks to finish before new 4 tasks would be scheduled.
 This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the --jobs N parameter.
 In the field, on a cluster of 12 nodes with 4-5 TiB data each, we've seen that the whole
process was taking more than 7 days, instead of estimated 1.5-2 days (provided there would
be close to full N slots utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction slot.
> Additionally, starting from the biggest SSTables could further reduce the total time
required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message