hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zjs...@apache.org
Subject hadoop git commit: YARN-2854. Updated the documentation of the timeline service and the generic history service. Contributed by Naganarasimha G R.
Date Mon, 16 Mar 2015 18:02:21 GMT
Repository: hadoop
Updated Branches:
  refs/heads/branch-2.7 2b2f7f2b9 -> d702816e7


YARN-2854. Updated the documentation of the timeline service and the generic history service.
Contributed by Naganarasimha G R.

(cherry picked from commit fbe811d904d4325ae17a83071c841755461f52b7)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/d702816e
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/d702816e
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/d702816e

Branch: refs/heads/branch-2.7
Commit: d702816e7db06f0c7143b18d4557b92307c4796b
Parents: 2b2f7f2
Author: Zhijie Shen <zjshen@apache.org>
Authored: Mon Mar 16 10:52:32 2015 -0700
Committer: Zhijie Shen <zjshen@apache.org>
Committed: Mon Mar 16 11:02:06 2015 -0700

----------------------------------------------------------------------
 hadoop-yarn-project/CHANGES.txt                 |   3 +
 .../src/site/markdown/TimelineServer.md         | 318 ++++++++++---------
 .../resources/images/timeline_structure.jpg     | Bin 0 -> 23070 bytes
 3 files changed, 165 insertions(+), 156 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/d702816e/hadoop-yarn-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt
index 0d08ec6..66f36e3 100644
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@@ -330,6 +330,9 @@ Release 2.7.0 - UNRELEASED
     YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user
     or group. (Gururaj Shetty via jianhe)
 
+    YARN-2854. Updated the documentation of the timeline service and the generic
+    history service. (Naganarasimha G R via zjshen)
+
   OPTIMIZATIONS
 
     YARN-2990. FairScheduler's delay-scheduling always waits for node-local and 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/d702816e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
index 8ac1e3b..cb8a5d3 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
@@ -16,144 +16,122 @@ YARN Timeline Server
 ====================
 
 * [Overview](#Overview)
-* [Current Status](#Current_Status)
-* [Basic Configuration](#Basic_Configuration)
-* [Advanced Configuration](#Advanced_Configuration)
-* [Generic-data related Configuration](#Generic-data_related_Configuration)
-* [Per-framework-date related Configuration](#Per-framework-date_related_Configuration)
-* [Running Timeline server](#Running_Timeline_server)
-* [Accessing generic-data via command-line](#Accessing_generic-data_via_command-line)
-* [Publishing of per-framework data by applications](#Publishing_of_per-framework_data_by_applications)
+    * [Introduction](#Introduction)
+    * [Current Status](#Current_Status)
+    * [Timeline Structure](#Timeline_Structure)
+* [Deployment](#Deployment)
+    * [Configurations](#Configurations)
+    * [Running Timeline server](#Running_Timeline_server)
+    * [Accessing generic-data via command-line](#Accessing_generic-data_via_command-line)
+* [Publishing of application specific data](#Publishing_of_application_specific_data)
 
 Overview
---------
+---------
 
-Storage and retrieval of applications' current as well as historic information in a generic
fashion is solved in YARN through the Timeline Server (previously also called Generic Application
History Server). This serves two responsibilities:
+### Introduction  
 
-* Generic information about completed applications
-    
-    Generic information includes application level data like queue-name, user information
etc in the ApplicationSubmissionContext, list of application-attempts that ran for an application,
information about each application-attempt, list of containers run under each application-attempt,
and information about each container. Generic data is stored by ResourceManager to a history-store
(default implementation on a file-system) and used by the web-UI to display information about
completed applications.
+ Storage and retrieval of application's current as well as historic information in a generic
fashion is solved in YARN through the Timeline Server. This serves two responsibilities:
 
-* Per-framework information of running and completed applications
-    
-    Per-framework information is completely specific to an application or framework. For
example, Hadoop MapReduce framework can include pieces of information like number of map tasks,
reduce tasks, counters etc. Application developers can publish the specific information to
the Timeline server via TimelineClient from within a client, the ApplicationMaster and/or
the application's containers. This information is then queryable via REST APIs for rendering
by application/framework specific UIs.
+#### Application specific information
 
-Current Status
---------------
+  Supports collection of information completely specific to an application or framework.
For example, Hadoop MapReduce framework can include pieces of information like number of map
tasks, reduce tasks, counters etc. Application developers can publish the specific information
to the Timeline server via TimelineClient, the ApplicationMaster and/or the application's
containers. This information is then queryable via REST APIs for rendering by application/framework
specific UIs.
 
-Timeline sever is a work in progress. The basic storage and retrieval of information, both
generic and framework specific, are in place. Timeline server doesn't work in secure mode
yet. The generic information and the per-framework information are today collected and presented
separately and thus are not integrated well together. Finally, the per-framework information
is only available via RESTful APIs, using JSON type content - ability to install framework
specific UIs in YARN isn't supported yet.
+#### Generic information about completed applications
+  
+  Previously this was done by Application History Server but with  timeline server its just
one use case of Timeline server functionality. Generic information includes application level
data like queue-name, user information etc in the ApplicationSubmissionContext, list of application-attempts
that ran for an application, information about each application-attempt, list of containers
run under each application-attempt, and information about each container. Generic data is
published by ResourceManager to the timeline store and used by the web-UI to display information
about completed applications.
+ 
 
-Basic Configuration
--------------------
+### Current Status
 
-Users need to configure the Timeline server before starting it. The simplest configuration
you should add in `yarn-site.xml` is to set the hostname of the Timeline server.
+  The essential functionality of the timeline server have been completed and it can work
in both secure and non secure modes. The generic history service is also built on timeline
store. In subsequent releases we will be rolling out next generation timeline service which
is scalable and reliable. Currently, Application specific information is only available via
RESTful APIs using JSON type content. The ability to install framework specific UIs in YARN
is not supported yet.
 
-```xml
-<property>
-  <description>The hostname of the Timeline service web application.</description>
-  <name>yarn.timeline-service.hostname</name>
-  <value>0.0.0.0</value>
-</property>
-```
+### Timeline Structure
 
-Advanced Configuration
-----------------------
+![Timeline Structure] (./images/timeline_structure.jpg)
 
-In addition to the hostname, admins can also configure whether the service is enabled or
not, the ports of the RPC and the web interfaces, and the number of RPC handler threads.
+#### TimelineDomain
 
-```xml
-<property>
-  <description>Address for the Timeline server to start the RPC server.</description>
-  <name>yarn.timeline-service.address</name>
-  <value>${yarn.timeline-service.hostname}:10200</value>
-</property>
+  Domain is like namespace for Timeline server and users can host multiple entities, isolating
them from others. Timeline server Security is defined at this level. Domain majorly stores
owner info, read & write ACL information, created and modified time stamp information.
Domain is uniquely identified by ID.
 
-<property>
-  <description>The http address of the Timeline service web application.</description>
-  <name>yarn.timeline-service.webapp.address</name>
-  <value>${yarn.timeline-service.hostname}:8188</value>
-</property>
+#### TimelineEntity
 
-<property>
-  <description>The https address of the Timeline service web application.</description>
-  <name>yarn.timeline-service.webapp.https.address</name>
-  <value>${yarn.timeline-service.hostname}:8190</value>
-</property>
+  Entity contains the the meta information of some conceptual entity and its related events.
The entity can be an application, an application attempt, a container or whatever the user-defined
object. It contains Primary filters which will be used to index the entities in TimelineStore,
such that users should carefully choose the information they want to store as the primary
filters. The remaining data can be stored as other information. Entity is uniquely identified
by EntityId and EntityType.
 
-<property>
-  <description>Handler thread count to serve the client RPC requests.</description>
-  <name>yarn.timeline-service.handler-thread-count</name>
-  <value>10</value>
-</property>
+#### TimelineEvent
 
-<property>
-  <description>Enables cross-origin support (CORS) for web services where
-  cross-origin web response headers are needed. For example, javascript making
-  a web services request to the timeline server.</description>
-  <name>yarn.timeline-service.http-cross-origin.enabled</name>
-  <value>false</value>
-</property>
+  TimelineEvent contains the information of an event that is related to some conceptual entity
of an application. Users are free to define what the event means, such as starting an application,
getting allocated a container and etc.
 
-<property>
-  <description>Comma separated list of origins that are allowed for web
-  services needing cross-origin (CORS) support. Wildcards (*) and patterns
-  allowed</description>
-  <name>yarn.timeline-service.http-cross-origin.allowed-origins</name>
-  <value>*</value>
-</property>
+Deployment
+----------
 
-<property>
-  <description>Comma separated list of methods that are allowed for web
-  services needing cross-origin (CORS) support.</description>
-  <name>yarn.timeline-service.http-cross-origin.allowed-methods</name>
-  <value>GET,POST,HEAD</value>
-</property>
+###Configurations
 
-<property>
-  <description>Comma separated list of headers that are allowed for web
-  services needing cross-origin (CORS) support.</description>
-  <name>yarn.timeline-service.http-cross-origin.allowed-headers</name>
-  <value>X-Requested-With,Content-Type,Accept,Origin</value>
-</property>
+#### Basic Configuration
 
-<property>
-  <description>The number of seconds a pre-flighted request can be cached
-  for web services needing cross-origin (CORS) support.</description>
-  <name>yarn.timeline-service.http-cross-origin.max-age</name>
-  <value>1800</value>
-</property>
-```
+| Configuration Property | Description |
+|:---- |:---- |
+| `yarn.timeline-service.enabled` | Indicate to clients whether Timeline service is enabled
or not. If enabled, the TimelineClient library used by end-users will post entities and events
to the Timeline server. Defaults to false. |
+| `yarn.resourcemanager.system-metrics-publisher.enabled` | The setting that controls whether
yarn system metrics is published on the timeline server or not by RM. Defaults to false. |
+| `yarn.timeline-service.generic-application-history.enabled` | Indicate to clients whether
to query generic application data from timeline history-service or not. If not enabled then
application data is queried only from Resource Manager. Defaults to false. |
 
-Generic-data related Configuration
-----------------------------------
+#### Advanced configuration
 
-Users can specify whether the generic data collection is enabled or not, and also choose
the storage-implementation class for the generic data. There are more configurations related
to generic data collection, and users can refer to `yarn-default.xml` for all of them.
+| Configuration Property | Description |
+|:---- |:---- |
+| `yarn.timeline-service.ttl-enable` | Enable age off of timeline store data. Defaults to
true. |
+| `yarn.timeline-service.ttl-ms` | Time to live for timeline store data in milliseconds.
Defaults to 604800000 (7 days). |
+| `yarn.timeline-service.handler-thread-count` | Handler thread count to serve the client
RPC requests. Defaults to 10. |
+| `yarn.timeline-service.client.max-retries` | Default maximum number of retires for timeline
servive client. Defaults to 30. |
+| `yarn.timeline-service.client.retry-interval-ms` | Default retry time interval for timeline
servive client. Defaults to 1000. |
 
-```xml
-<property>
-  <description>Indicate to ResourceManager as well as clients whether
-  history-service is enabled or not. If enabled, ResourceManager starts
-  recording historical data that Timelien service can consume. Similarly,
-  clients can redirect to the history service when applications
-  finish if this is enabled.</description>
-  <name>yarn.timeline-service.generic-application-history.enabled</name>
-  <value>false</value>
-</property>
+#### Timeline store and state store configuration
 
-<property>
-  <description>Store class name for history store, defaulting to file system
-  store</description>
-  <name>yarn.timeline-service.generic-application-history.store-class</name>
-  <value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
-</property>
-```
+| Configuration Property | Description |
+|:---- |:---- |
+| `yarn.timeline-service.store-class` | Store class name for timeline store. Defaults to
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore. |
+| `yarn.timeline-service.leveldb-timeline-store.path` | Store file name for leveldb timeline
store. Defaults to ${hadoop.tmp.dir}/yarn/timeline. |
+| `yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms` | Length of time to wait
between deletion cycles of leveldb timeline store in milliseconds. Defaults to 300000. |
+| `yarn.timeline-service.leveldb-timeline-store.read-cache-size` | Size of read cache for
uncompressed blocks for leveldb timeline store in bytes. Defaults to 104857600. |
+| `yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size` | Size of cache
for recently read entity start times for leveldb timeline store in number of entities. Defaults
to 10000. |
+| `yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size` | Size of cache
for recently written entity start times for leveldb timeline store in number of entities.
Defaults to 10000. |
+| `yarn.timeline-service.recovery.enabled` | Defaults to false. |
+| `yarn.timeline-service.state-store-class` | Store class name for timeline state store.
Defaults to org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore. |
+| `yarn.timeline-service.leveldb-state-store.path` | Store file name for leveldb timeline
state store. |
+
+#### Web and RPC Configuration
+
+| Configuration Property | Description |
+|:---- |:---- |
+| `yarn.timeline-service.hostname` | The hostname of the Timeline service web application.
Defaults to 0.0.0.0. |
+| `yarn.timeline-service.address` | Address for the Timeline server to start the RPC server.
Defaults to ${yarn.timeline-service.hostname}:10200. |
+| `yarn.timeline-service.webapp.address` | The http address of the Timeline service web application.
Defaults to ${yarn.timeline-service.hostname}:8188. |
+| `yarn.timeline-service.webapp.https.address` | The https address of the Timeline service
web application. Defaults to ${yarn.timeline-service.hostname}:8190. |
+| `yarn.timeline-service.bind-host` | The actual address the server will bind to. If this
optional address is set, the RPC and webapp servers will bind to this address and the port
specified in yarn.timeline-service.address and yarn.timeline-service.webapp.address, respectively.
This is most useful for making the service listen to all interfaces by setting to 0.0.0.0.
|
+| `yarn.timeline-service.http-cross-origin.enabled` | Enables cross-origin support (CORS)
for web services where cross-origin web response headers are needed. For example, javascript
making a web services request to the timeline server. Defaults to false. |
+| `yarn.timeline-service.http-cross-origin.allowed-origins` | Comma separated list of origins
that are allowed for web services needing cross-origin (CORS) support. Wildcards `(*)` and
patterns allowed. Defaults to `*`. |
+| yarn.timeline-service.http-cross-origin.allowed-methods | Comma separated list of methods
that are allowed for web services needing cross-origin (CORS) support. Defaults to GET,POST,HEAD.
|
+| `yarn.timeline-service.http-cross-origin.allowed-headers` | Comma separated list of headers
that are allowed for web services needing cross-origin (CORS) support. Defaults to X-Requested-With,Content-Type,Accept,Origin.
|
+| `yarn.timeline-service.http-cross-origin.max-age` | The number of seconds a pre-flighted
request can be cached for web services needing cross-origin (CORS) support. Defaults to 1800.
|
+
+#### Security Configuration
 
-Per-framework-date related Configuration
-----------------------------------------
+ Security can be enabled by setting yarn.timeline-service.http-authentication.type to kerberos
and further following configurations can be done.
 
-Users can specify whether per-framework data service is enabled or not, choose the store
implementation for the per-framework data, and tune the retention of the per-framework data.
There are more configurations related to per-framework data service, and users can refer to
`yarn-default.xml` for all of them.
+| Configuration Property | Description |
+|:---- |:---- |
+| `yarn.timeline-service.http-authentication.type` | Defines authentication used for the
timeline server HTTP endpoint. Supported values are: simple / kerberos / #AUTHENTICATION_HANDLER_CLASSNAME#.
Defaults to simple. |
+| `yarn.timeline-service.http-authentication.simple.anonymous.allowed` | Indicates if anonymous
requests are allowed by the timeline server when using 'simple' authentication. Defaults to
true. |
+| `yarn.timeline-service.principal` | The Kerberos principal for the timeline server. |
+| yarn.timeline-service.keytab | The Kerberos keytab for the timeline server. Defaults to
/etc/krb5.keytab. |
+| `yarn.timeline-service.delegation.key.update-interval` | Defaults to 86400000 (1 day).
|
+| `yarn.timeline-service.delegation.token.renew-interval` | Defaults to 86400000 (1 day).
|
+| `yarn.timeline-service.delegation.token.max-lifetime` | Defaults to 604800000 (7 day).
|
 
-```xml
+#### Enabling the timeline service and the generic history service
+
+  Following are the basic configuration to start Timeline server.
+
+```
 <property>
   <description>Indicate to clients whether Timeline service is enabled or not.
   If enabled, the TimelineClient library used by end-users will post entities
@@ -163,69 +141,97 @@ Users can specify whether per-framework data service is enabled or not,
choose t
 </property>
 
 <property>
-  <description>Store class name for timeline store.</description>
-  <name>yarn.timeline-service.store-class</name>
-  <value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value>
-</property>
-
-<property>
-  <description>Enable age off of timeline store data.</description>
-  <name>yarn.timeline-service.ttl-enable</name>
+  <description>The setting that controls whether yarn system metrics is
+  published on the timeline server or not by RM.</description>
+  <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
   <value>true</value>
 </property>
 
 <property>
-  <description>Time to live for timeline store data in milliseconds.</description>
-  <name>yarn.timeline-service.ttl-ms</name>
-  <value>604800000</value>
+  <description>Indicate to clients whether to query generic application
+  data from timeline history-service or not. If not enabled then application
+  data is queried only from Resource Manager.</description>
+  <name>yarn.timeline-service.generic-application-history.enabled</name>
+  <value>true</value>
 </property>
 ```
 
-Running Timeline server
------------------------
+### Running Timeline server
 
-Assuming all the aforementioned configurations are set properly, admins can start the Timeline
server/history service with the following command:
+  Assuming all the aforementioned configurations are set properly, admins can start the Timeline
server/history service with the following command:
 
-      $ yarn timelineserver
+```
+  $ yarn timelineserver
+```
 
-Or users can start the Timeline server / history service as a daemon:
+  Or users can start the Timeline server / history service as a daemon:
 
-      $ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start timelineserver
+```
+  $ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start timelineserver
+```
 
-Accessing generic-data via command-line
----------------------------------------
+### Accessing generic-data via command-line
 
-Users can access applications' generic historic data via the command line as below. Note
that the same commands are usable to obtain the corresponding information about running applications.
+  Users can access applications' generic historic data via the command line as below. Note
that the same commands are usable to obtain the corresponding information about running applications.
 
 ```
-      $ yarn application -status <Application ID>
-      $ yarn applicationattempt -list <Application ID>
-      $ yarn applicationattempt -status <Application Attempt ID>
-      $ yarn container -list <Application Attempt ID>
-      $ yarn container -status <Container ID>
+  $ yarn application -status <Application ID>
+  $ yarn applicationattempt -list <Application ID>
+  $ yarn applicationattempt -status <Application Attempt ID>
+  $ yarn container -list <Application Attempt ID>
+  $ yarn container -status <Container ID>
 ```
 
-Publishing of per-framework data by applications
+Publishing of application specific data
 ------------------------------------------------
 
-Developers can define what information they want to record for their applications by composing
`TimelineEntity` and `TimelineEvent` objects, and put the entities and events to the Timeline
server via `TimelineClient`. Following is an example:
-
-```java
-// Create and start the Timeline client
-TimelineClient client = TimelineClient.createTimelineClient();
-client.init(conf);
-client.start();
-
-TimelineEntity entity = null;
-// Compose the entity
-try {
-  TimelinePutResponse response = client.putEntities(entity);
-} catch (IOException e) {
-  // Handle the exception
-} catch (YarnException e) {
-  // Handle the exception
-}
-
-// Stop the Timeline client
-client.stop();
+  Developers can define what information they want to record for their applications by composing
`TimelineEntity` and  `TimelineEvent` objects, and put the entities and events to the Timeline
server via `TimelineClient`. Below is an example:
+
+```
+  // Create and start the Timeline client
+  TimelineClient client = TimelineClient.createTimelineClient();
+  client.init(conf);
+  client.start();
+
+  try {
+    TimelineDomain myDomain = new TimelineDomain();
+    myDomain.setID("MyDomain");
+    // Compose other Domain info ....
+
+    client.putDomain(myDomain);
+
+    TimelineEntity myEntity = new TimelineEntity();
+    myEntity.setDomainId(myDomain.getId());
+    myEntity.setEntityType("APPLICATION");
+    myEntity.setEntityID("MyApp1")
+    // Compose other entity info
+
+    TimelinePutResponse response = client.putEntities(entity);
+
+    
+    TimelineEvent event = new TimelineEvent();
+    event.setEventType("APP_FINISHED");
+    event.setTimestamp(System.currentTimeMillis());
+    event.addEventInfo("Exit Status", "SUCCESS");
+    // Compose other Event info ....
+
+    myEntity.addEvent(event);
+    timelineClient.putEntities(entity);
+
+  } catch (IOException e) {
+    // Handle the exception
+  } catch (YarnException e) {
+    // Handle the exception
+  }
+
+  // Stop the Timeline client
+  client.stop();
 ```
+
+  **Note** : Following are the points which needs to be observed during updating a entity.
+
+  * Domain ID should not be modified for already existing entity.
+
+  * Its advisable to have same primary filters for all updates on entity. As on modification
of primary filter by subsequent updates will result in not fetching the information before
the update when queried with updated primary filter.
+
+  * On modification of Primary filter value, new value will be appended with the old value.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/d702816e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/timeline_structure.jpg
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/timeline_structure.jpg
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/timeline_structure.jpg
new file mode 100644
index 0000000..dbfce25
Binary files /dev/null and b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/timeline_structure.jpg
differ


Mime
View raw message