hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants
Date Tue, 02 May 2017 23:31:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994014#comment-15994014

Carlo Curino commented on YARN-6451:

I see two or three alternatives:
 # Hard-coding the most important invariants in a programmatic way, you see an example of
this in: YARN-6473, where I poke the {{ReservationSystem}} and {{YarnScheduler}} to check
whether their data-structures remain in sync during execution. This is more minimalistic/efficient,
but any extension requires code changes. For example, you can maintain an observer of container
allocations, and check that certain ordering properties are respected.
 # Expand the mechanics of YARN-6451 by adding "bindings" for many more parts of the RM internal
state, which one is allowed to mentioned in the {{invariants.txt}} file. Metrics was a natural
starting point, as the cost of gathering is already there, and their names are externally
known. To minimize the cost, we could load the {{invariants.txt}} expressions, and then limit
the "state" we probe to be the least one covering the needs of our expressions.
 # Leverage compiler APIs / aspects / dependency-injection type of tricks to dynamically modify
the code that does the binding work, to cover whatever appears in {{invariants.txt}} file.
This is obviously the richest one, though it has some maintainability issues. 

In YARN-6547 I propose a simple way of combining YARN-6363 and YARN-6451 capabilities to run
tests that check an SLS run for common invariants (both during and at the end of the run).
That is mostly a mechanism patch, but we can work together to define very tight yet robust
invariants for specific runs.

> Add RM monitor validating metrics invariants
> --------------------------------------------
>                 Key: YARN-6451
>                 URL: https://issues.apache.org/jira/browse/YARN-6451
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>             Fix For: 3.0.0-alpha3
>         Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, YARN-6451.v2.patch, YARN-6451.v3.patch,
YARN-6451.v4.patch, YARN-6451.v5.patch
> For SLS runs, as well as for live test clusters (and maybe prod), it would be useful
to have a mechanism to continuously check whether core invariants of the RM/Scheduler are
respected (e.g., no priority inversions, fairness mostly respected, certain latencies within
expected range, etc..)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message