aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Huang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-1929) Improve explicit task history pruning.
Date Wed, 31 May 2017 23:16:04 GMT

     [ https://issues.apache.org/jira/browse/AURORA-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kai Huang updated AURORA-1929:
------------------------------
    Description: 
There are currently two types of task history pruning running by aurora:
# The implicit task history pruning running by TaskHistoryPrunner in the background, which
registers all inactive tasks upon terminal state change for pruning.
# The explicit task history pruning initiated by `aurora_admin prune_tasks` command, which
prunes inactive tasks in the cluster.

The prune_tasks endpoint seems to be very slow when the cluster has a large number of inactive
tasks. 

For example, when we use $ aurora_admin prune_tasks for 135k running tasks (1k jobs), it takes
about ~30 minutes to prune all tasks, the pruning speed seems to max out at 3k tasks per minute.

Currently, aurora uses StreamManager to manages a single log stream append transaction for
task history pruning. Local storage ops can be added to the transaction and then later committed
as an atomic unit. However, the StateManager removes tasks one by one in a for-loop(https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L376),
and each RemoveTasks operation is coalesced with its previous operation, which seems inefficient
and unnecessary (https://github.com/apache/aurora/blob/c85bffdd6f68312261697eee868d57069adda434/src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java#L324).

We need to batch all removeTasks operations and execute them all at once to avoid additional
cost of coalescing. The fix will also benefit implicit task history pruning since it has similar
underlying implementation.

  was:
There are currently two types of task history pruning running by aurora:
# The implicit task history pruning running by TaskHistoryPrunner in the background, which
registers all inactive tasks upon terminal state change for pruning.
# The explicit task history pruning initiated by `aurora_admin prune_tasks` command, which
prunes inactive tasks in the cluster.

The prune_tasks endpoint seems to be very slow when the cluster has a large number of inactive
tasks. 

For example, when we use $ aurora_admin prune_tasks for 135k running tasks (1k jobs), it takes
about ~30 minutes to prune all tasks, the pruning speed seems to max out at 3k tasks per minute.

Currently, aurora uses StreamManager to manages a single log stream append transaction for
task history pruning. Local storage ops(RemoveTasks) can be added to the transaction and then
later committed as an atomic unit. However, the current implementation remove tasks one by
one in a for-loop(https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L376),
and coalesces each RemoveTasks operation with its previous operation, which seems inefficient
and unnecessary (https://github.com/apache/aurora/blob/c85bffdd6f68312261697eee868d57069adda434/src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java#L324).

We need to batch all removeTasks operations and execute them all at once to avoid additional
cost of coalescing. The fix will also benefit implicit task history pruning since it has similar
underlying implementation.


> Improve explicit task history pruning.
> --------------------------------------
>
>                 Key: AURORA-1929
>                 URL: https://issues.apache.org/jira/browse/AURORA-1929
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Kai Huang
>            Assignee: Kai Huang
>            Priority: Minor
>
> There are currently two types of task history pruning running by aurora:
> # The implicit task history pruning running by TaskHistoryPrunner in the background,
which registers all inactive tasks upon terminal state change for pruning.
> # The explicit task history pruning initiated by `aurora_admin prune_tasks` command,
which prunes inactive tasks in the cluster.
> The prune_tasks endpoint seems to be very slow when the cluster has a large number of
inactive tasks. 
> For example, when we use $ aurora_admin prune_tasks for 135k running tasks (1k jobs),
it takes about ~30 minutes to prune all tasks, the pruning speed seems to max out at 3k tasks
per minute.
> Currently, aurora uses StreamManager to manages a single log stream append transaction
for task history pruning. Local storage ops can be added to the transaction and then later
committed as an atomic unit. However, the StateManager removes tasks one by one in a for-loop(https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L376),
and each RemoveTasks operation is coalesced with its previous operation, which seems inefficient
and unnecessary (https://github.com/apache/aurora/blob/c85bffdd6f68312261697eee868d57069adda434/src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java#L324).
> We need to batch all removeTasks operations and execute them all at once to avoid additional
cost of coalescing. The fix will also benefit implicit task history pruning since it has similar
underlying implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message