hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Ge (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
Date Wed, 13 Jul 2016 04:50:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374343#comment-15374343
] 

Chen Ge commented on YARN-4091:
-------------------------------

Hi all,

Given "YARN-4091.preliminary.1.patch" I uploaded above, here are some brief descriptions about
newly added classes and test REST API.

Newly Added Classes:
ActivityManager:
	A class to store node or application allocations. It mainly contains operations for allocation
start, add, update and finish.

NodeAllocation:
	It contains allocation information for one allocation in a node heartbeat. Detailed allocation
activities are first stored in "AllocationActivity" as operations, then transformed to a tree
structure. Tree structure starts from root queue and ends in leaf queue, application or container
allocation.

AllocationActivity:
	It records an activity operation in allocation, which can be classified as queue, application
or container activity. Other information include state, diagnostic, priority.

ActivityNode:
	It represents tree node in "NodeAllocation" tree structure. Each node may represent queue,
application or container in allocation activity. Node may have children node if successfully
allocated to next level.

ActivityDiagnosticConstant:
	Collection of diagnostics.

ActivityState:
	Collection of activity operation states.

AllocationState:
	Collection of allocation final states.

AllocationActivityType:
	Collection of types for activity operation.

AppAllocation:
	It contains allocation information for one application within a period of time. Each application
allocation may have several allocation attempts.

ActivitiesInfo:
	DAO object to display node allocation activity.

NodeAllocationInfo:
	DAO object to display each node allocation in node heartbeat.

ActivityNodeInfo:
	DAO object to display node information in allocation tree. It corresponds to "ActivityNode"
class.

AppActivitiesInfo:
	DAO object to display application activity.

AppAllocationInfo:
	DAO object to display application allocation detailed information.


Test REST API:
	look at next node’s activities(by default):
	http://localhost:18088/ws/v1/cluster/scheduler/activities

	Only look at specific node:
	http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87:75
	OR without port number
	http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87

	look at activities for specific application within a period of time(3s in default):
	http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022
	http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022&maxTime=5.2


Test class:
	TestRMWebServicesCapacitySched.java
	org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testActivityJSON
	org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testAppActivityJSON

Thanks for review. Please feel free to put forward any suggestions for improvements.

> Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, YARN-4091-design-doc-v1.pdf,
YARN-4091.preliminary.1.patch
>
>
> As schedulers are improved with various new capabilities, more configurations which tunes
the schedulers starts to take actions such as limit assigning containers to an application,
or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various
scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler
where it skips/rejects container assignment, activate application etc. Such information will
help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as
we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message