hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6451) Create a monitor to check whether we maintain RM (scheduling) invariants
Date Mon, 17 Apr 2017 21:58:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971709#comment-15971709

Chris Douglas commented on YARN-6451:

bq.  when invariants are violated the log line is harder to read if combined, but perf is
much better. In the current example of invariants.txt I will leave this with one invariant
per line, so slower but easier to understand---works?

This could evaluate the combined expression, and only if it detects some violation, iterate
over the set of expressions to print specific error messages. Though shaving fractions of
a millisecond off the validation check is probably not significant.

+1 overall. For future versions:
* The invariant checker might want to use bindings across contexts; this would be hard to
express as subtypes of {{InvariantsChecker}}. For example, if one wanted to check some invariant
using values from the scheduler and the metrics, there isn't a good way to compose the two
with inheritance. That said, in the current RM it's hard to correlate values collected from
multiple components without reasoning about their mutual consistency in a brittle, ad hoc
way. How invariants are loaded and how errors are handled could also be abstracted, but (IMHO)
that'd be premature. This is approachable as-is.
* The unit test is kind of light
* This could print a warning when it starts up, since it's mostly for testing. If it's accidentally
deployed in a production setting, it should show up in the log. The RM refuses to start if
{{invariants.txt}} is missing?

> Create a monitor to check whether we maintain RM (scheduling) invariants
> ------------------------------------------------------------------------
>                 Key: YARN-6451
>                 URL: https://issues.apache.org/jira/browse/YARN-6451
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, YARN-6451.v2.patch, YARN-6451.v3.patch
> For SLS runs, as well as for live test clusters (and maybe prod), it would be useful
to have a mechanism to continuously check whether core invariants of the RM/Scheduler are
respected (e.g., no priority inversions, fairness mostly respected, certain latencies within
expected range, etc..)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message