aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-302) TaskGroups may abandon tasks
Date Sun, 06 Apr 2014 08:59:14 GMT

    [ https://issues.apache.org/jira/browse/AURORA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961352#comment-13961352
] 

Bill Farner commented on AURORA-302:
------------------------------------

https://reviews.apache.org/r/20066/

> TaskGroups may abandon tasks
> ----------------------------
>
>                 Key: AURORA-302
>                 URL: https://issues.apache.org/jira/browse/AURORA-302
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Bill Farner
>
> I've yet to figure out exactly how this happens, but i've witnessed this twice successively
in vagrant (but was unable to repro while trying to debug it), and once in production.
> TaskGroups appears to have a bug that causes it to keep a group in the {{groups}} data
structure, but with no corresponding async task in {{executor}}.  The design of TaskGroups
is such that each task group must ~always be represented in both (almost always because the
executor entry will be absent briefly while trying to schedule a task).
> The one i observed in production looked like this (in /pendingtasks):
> {noformat}
> {
> penaltyMs: 30000,
> name: "role/env/job",
> taskIds: [ ]
> },
> {noformat}
> When i saw it in vagrant:
> {noformat}
> {
> penaltyMs: 1,
> name: "role/env/job",
> taskIds: [ ]
> },
> {noformat}
> Additionally, the {{schedule_queue_size}} in vagrant was consistently zero when i observed
this, further supporting the hypothesis that the group was not being evaluated.
> TaskGroups is intended to invalidate empty groups, so the mere presence of one suggests
that it has been dropped.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message