hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From varunsax...@apache.org
Subject [48/50] [abbrv] hadoop git commit: YARN-6047 Documentation updates for TimelineService v2 (Contributed by Rohith Sharma)
Date Tue, 29 Aug 2017 20:42:33 GMT
YARN-6047 Documentation updates for TimelineService v2 (Contributed by Rohith Sharma)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/e9f2c265
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/e9f2c265
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/e9f2c265

Branch: refs/heads/YARN-5355
Commit: e9f2c2654031af0efa352c36a662859c6c61ba8c
Parents: 709ed05
Author: Vrushali C <vrushali@apache.org>
Authored: Mon Aug 21 20:29:05 2017 -0700
Committer: Varun Saxena <varunsaxena@apache.org>
Committed: Wed Aug 30 01:18:45 2017 +0530

----------------------------------------------------------------------
 hadoop-project/src/site/markdown/index.md.vm    |   8 +-
 .../src/site/markdown/TimelineServiceV2.md      | 333 ++++++++++++++++++-
 2 files changed, 324 insertions(+), 17 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9f2c265/hadoop-project/src/site/markdown/index.md.vm
----------------------------------------------------------------------
diff --git a/hadoop-project/src/site/markdown/index.md.vm b/hadoop-project/src/site/markdown/index.md.vm
index 62e21b2..bb7bda2 100644
--- a/hadoop-project/src/site/markdown/index.md.vm
+++ b/hadoop-project/src/site/markdown/index.md.vm
@@ -55,17 +55,15 @@ documentation.
 YARN Timeline Service v.2
 -------------------
 
-We are introducing an early preview (alpha 1) of a major revision of YARN
+We are introducing an early preview (alpha 2) of a major revision of YARN
 Timeline Service: v.2. YARN Timeline Service v.2 addresses two major
 challenges: improving scalability and reliability of Timeline Service, and
 enhancing usability by introducing flows and aggregation.
 
-YARN Timeline Service v.2 alpha 1 is provided so that users and developers
+YARN Timeline Service v.2 alpha 2 is provided so that users and developers
 can test it and provide feedback and suggestions for making it a ready
 replacement for Timeline Service v.1.x. It should be used only in a test
-capacity. Most importantly, security is not enabled. Do not set up or use
-Timeline Service v.2 until security is implemented if security is a
-critical requirement.
+capacity.
 
 More details are available in the
 [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html)

http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9f2c265/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServiceV2.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServiceV2.md
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServiceV2.md
index 678c406..2de305d 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServiceV2.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServiceV2.md
@@ -73,12 +73,8 @@ The following diagram illustrates the design at a high level.
 
 ### <a name="Current_Status"></a>Current Status and Future Plans
 
-YARN Timeline Service v.2 is currently in alpha ("alpha 1"). It is very much work in progress,
and
-many things can and will change rapidly. Users must enable Timeline Service v.2 only on a
test or
-experimental cluster to test the feature.
-
-Most importantly, **security is not enabled**. Do not set up or use Timeline Service v.2
until
-security is implemented if security is a requirement.
+YARN Timeline Service v.2 is currently in alpha ("alpha 2"). It is a work in progress, and
+many things can and will change rapidly.
 
 A complete end-to-end flow of writes and reads is functional, with Apache HBase as the backend.
 You should be able to start generating data. When enabled, all YARN-generic events are
@@ -95,16 +91,19 @@ resource manager also has its dedicated in-process collector. The reader
is curr
 instance. Currently, it is not possible to write to Timeline Service outside the context
of a YARN
 application (i.e. no off-cluster client).
 
+Starting from alpha2, Timeline Service v.2 supports simple authorization in terms of a
+configurable whitelist of users and groups who can read timeline data. Cluster admins are
+allowed by default to read timeline data.
+
 When YARN Timeline Service v.2 is disabled, one can expect no functional or performance impact
 on any other existing functionality.
 
 The work to make it truly production-ready continues. Some key items include
 
 * More robust storage fault tolerance
-* Security
 * Support for off-cluster clients
-* More complete and integrated web UI
 * Better support for long-running apps
+* Support for ACLs
 * Offline (time-based periodic) aggregation for flows, users, and queues for reporting and
 analysis
 * Timeline collectors as separate instances from node managers
@@ -144,6 +143,27 @@ New configuration parameters that are introduced with v.2 are marked
bold.
 | **`yarn.timeline-service.hbase.coprocessor.app-final-value-retention-milliseconds`** |
The setting that controls how long the final value of a metric of a completed app is retained
before merging into the flow sum. Defaults to `259200000` (3 days). This should be set in
the HBase cluster. |
 | **`yarn.rm.system-metrics-publisher.emit-container-events`** | The setting that controls
whether yarn container metrics is published to the timeline server or not by RM. This configuration
setting is for ATS V2. Defaults to `false`. |
 
+#### Security Configuration
+
+
+Security can be enabled by setting `yarn.timeline-service.http-authentication.type`
+to `kerberos`, after which the following configuration options are available:
+
+
+| Configuration Property | Description |
+|:---- |:---- |
+| `yarn.timeline-service.http-authentication.type` | Defines authentication used for the
timeline server(collector/reader) HTTP endpoint. Supported values are: `simple` / `kerberos`
/ #AUTHENTICATION_HANDLER_CLASSNAME#. Defaults to `simple`. |
+| `yarn.timeline-service.http-authentication.simple.anonymous.allowed` | Indicates if anonymous
requests are allowed by the timeline server when using 'simple' authentication. Defaults to
`true`. |
+| `yarn.timeline-service.http-authentication.kerberos.principal` | The Kerberos principal
to be used for the Timeline Server(Collector/Reader) HTTP endpoint. |
+| `yarn.timeline-service.http-authentication.kerberos.keytab` | The Kerberos keytab to be
used for the Timeline Server(Collector/Reader) HTTP endpoint.. |
+| `yarn.timeline-service.principal` | The Kerberos principal for the timeline reader. NM
principal would be used for timeline collector as it runs as an auxiliary service inside NM.
|
+| `yarn.timeline-service.keytab` | The Kerberos keytab for the timeline reader. NM keytab
would be used for timeline collector as it runs as an auxiliary service inside NM. |
+| `yarn.timeline-service.delegation.key.update-interval` | Defaults to `86400000` (1 day).
|
+| `yarn.timeline-service.delegation.token.renew-interval` | Defaults to `86400000` (1 day).
|
+| `yarn.timeline-service.delegation.token.max-lifetime` | Defaults to `604800000` (7 days).
|
+| `yarn.timeline-service.read.authentication.enabled` | Enables or disables authorization
checks for reading timeline service v2 data. Default is `false` which is disabled. |
+| `yarn.timeline-service.read.allowed.users` | Comma separated list of user, followed by
space, then comma separated list of groups. It will allow this list of users and groups to
read the data and reject everyone else. Default value is set to none. If authorization is
enabled, then this configuration is mandatory.  |
+
 #### Enabling CORS support
 To enable cross-origin support (CORS) for the Timeline Service v.2, please set the following
configuration parameters:
 
@@ -170,7 +190,7 @@ Each step is explained in more detail below.
 
 ##### <a name="Set_up_the_HBase_cluster"> </a>Step 1) Set up the HBase cluster
 The first part is to set up or pick an Apache HBase cluster to use as the storage cluster.
The
-version of Apache HBase that is supported with Timeline Service v.2 is 1.2.4. The 1.0.x versions
+version of Apache HBase that is supported with Timeline Service v.2 is 1.2.6. The 1.0.x versions
 do not work with Timeline Service v.2. Later versions of HBase have not been tested with
 Timeline Service.
 
@@ -333,6 +353,15 @@ To write MapReduce framework data to Timeline Service v.2, enable the
following
 </property>
 ```
 
+### Upgrade from alpha1 to alpha2
+If you are currently running Timeline Service v2 alpha1 version, we recommend the following:
+
+- Clear existing data in tables (truncate tables) since the row key for AppToFlow has changed.
+
+- The coprocessor is now a dynamically loaded table level coprocessor in alpha2. We
+recommend dropping the table, replacing the coprocessor jar on hdfs with the alpha2 one,
+restarting the Region servers and recreating the `flowrun` table.
+
 ### <a name="Publishing_of_application_specific_data"></a> Publishing application
specific data
 
 This section is for YARN application developers that want to integrate with Timeline Service
v.2.
@@ -517,6 +546,8 @@ predicates, an empty list will be returned.
   "daterange=20150711-20150714" returns flows active between these 2 dates.<br/>
   "daterange=20150711-" returns flows active on and after 20150711.<br/>
   "daterange=-20150711" returns flows active on and before 20150711.<br/>
+1. `fromid` -  If specified, retrieve the next set of flows from the given fromid. The set
of entities retrieved is inclusive of specified fromid.
+   fromid should be taken from the value associated with FROM_ID info key in flow entity
response which was sent earlier.
 
 #### Example JSON Response:
 
@@ -562,6 +593,7 @@ predicates, an empty list will be returned.
         "info": {
           "SYSTEM_INFO_CLUSTER": "test-cluster",
           "UID": "test-cluster!sjlee!ds-date",
+          "FROM_ID": "test-cluster!1460419200000!sjlee!ds-date",
           "SYSTEM_INFO_FLOW_NAME": "ds-date",
           "SYSTEM_INFO_DATE": 1460419200000,
           "SYSTEM_INFO_USER": "sjlee"
@@ -613,6 +645,8 @@ predicates, an empty list will be returned.
 1. `fields` - Specifies which fields to retrieve. For querying flow runs, only `ALL` or `METRICS`
are valid fields.
   Other fields will lead to HTTP 400 (Bad Request) response. If not specified, in response,
id, type, createdtime and info fields
   will be returned.
+1. `fromid` -  If specified, retrieve the next set of flow run entities from the given fromid.
The set of entities retrieved is inclusive of specified fromid.
+   fromid should be taken from the value associated with FROM_ID info key in flow entity
response which was sent earlier.
 
 #### Example JSON Response:
 
@@ -625,6 +659,7 @@ predicates, an empty list will be returned.
         "createdtime": 1460420587974,
         "info": {
           "UID": "test-cluster!sjlee!ds-date!1460420587974",
+          "FROM_ID": "test-cluster!sjlee!ds-date!1460420587974",
           "SYSTEM_INFO_FLOW_RUN_ID": 1460420587974,
           "SYSTEM_INFO_FLOW_NAME": "ds-date",
           "SYSTEM_INFO_FLOW_RUN_END_TIME": 1460420595198,
@@ -641,6 +676,7 @@ predicates, an empty list will be returned.
         "createdtime": 1460420305659,
         "info": {
           "UID": "test-cluster!sjlee!ds-date!1460420305659",
+          "FROM_ID": "test-cluster!sjlee!ds-date!1460420305659",
           "SYSTEM_INFO_FLOW_RUN_ID": 1460420305659,
           "SYSTEM_INFO_FLOW_NAME": "ds-date",
           "SYSTEM_INFO_FLOW_RUN_END_TIME": 1460420311966,
@@ -712,6 +748,7 @@ while querying individual flow runs.
       "isrelatedto": {},
       "info": {
         "UID":"yarn-cluster!varun!QuasiMonteCarlo!1465246348599",
+        "FROM_ID":"yarn-cluster!varun!QuasiMonteCarlo!1465246348599",
         "SYSTEM_INFO_FLOW_RUN_END_TIME":1465246378051,
         "SYSTEM_INFO_FLOW_NAME":"QuasiMonteCarlo",
         "SYSTEM_INFO_USER":"varun",
@@ -816,6 +853,10 @@ none of the apps match the predicates, an empty list will be returned.
   or metricstoretrieve is specified. Ignored otherwise. The maximum possible value for metricslimit
can be maximum value of
   Integer. If it is not specified or has a value less than 1, and metrics have to be retrieved,
then metricslimit will be
   considered as 1 i.e. latest single value of metric(s) will be returned.
+1. `metricsTimeStart` - If specified, then metrics for the entity after this timestamp are
returned.
+1. `metricsTimeEnd` - If specified, then metrics for the entity before this timestamp are
returned.
+1. `fromid` -  If specified, retrieve the next set of application entities from the given
fromid. The set of entities retrieved is inclusive of specified fromid.
+   fromid should be taken from the value associated with FROM_ID info key in flow entity
response which was sent earlier.
 
 #### Example JSON Response:
 
@@ -830,6 +871,7 @@ none of the apps match the predicates, an empty list will be returned.
         "configs": { },
         "info": {
           "UID": "yarn-cluster!application_1465246237936_0001"
+          "FROM_ID": "yarn-cluster!varun!QuasiMonteCarlo!1465246348599!application_1465246237936_0001",
         },
         "relatesto": { }
       },
@@ -843,6 +885,7 @@ none of the apps match the predicates, an empty list will be returned.
         "configs": { },
         "info": {
           "UID": "yarn-cluster!application_1464983628730_0005"
+          "FROM_ID": "yarn-cluster!varun!QuasiMonteCarlo!1465246348599!application_1464983628730_0005",
         },
         "relatesto": { }
       }
@@ -943,6 +986,10 @@ match the predicates, an empty list will be returned.
   or metricstoretrieve is specified. Ignored otherwise. The maximum possible value for metricslimit
can be maximum value of
   Integer. If it is not specified or has a value less than 1, and metrics have to be retrieved,
then metricslimit will be
   considered as 1 i.e. latest single value of metric(s) will be returned.
+1. `metricsTimeStart` - If specified, then metrics for the entity after this timestamp are
returned.
+1. `metricsTimeEnd` - If specified, then metrics for the entity before this timestamp are
returned.
+1. `fromid` -  If specified, retrieve the next set of application entities from the given
fromid. The set of entities retrieved is inclusive of specified fromid.
+   fromid should be taken from the value associated with FROM_ID info key in flow entity
response which was sent earlier.
 
 #### Example JSON Response:
 
@@ -955,6 +1002,7 @@ match the predicates, an empty list will be returned.
         "createdtime": 1460419580171,
         "info": {
           "UID": "test-cluster!sjlee!ds-date!1460419580171!application_1460419579913_0002"
+          "FROM_ID": "test-cluster!sjlee!ds-date!1460419580171!application_1460419579913_0002",
         },
         "configs": {},
         "isrelatedto": {},
@@ -1019,6 +1067,8 @@ and app id.
   or metricstoretrieve is specified. Ignored otherwise. The maximum possible value for metricslimit
can be maximum value of
   Integer. If it is not specified or has a value less than 1, and metrics have to be retrieved,
then metricslimit will be
   considered as 1 i.e. latest single value of metric(s) will be returned.
+1. `metricsTimeStart` - If specified, then metrics for the entity after this timestamp are
returned.
+1. `metricsTimeEnd` - If specified, then metrics for the entity before this timestamp are
returned.
 
 #### Example JSON Response:
 
@@ -1043,7 +1093,7 @@ and app id.
 1. If flow context information cannot be retrieved or application for the given app id cannot
be found, HTTP 404 (Not Found) is returned.
 1. For non-recoverable errors while retrieving data, HTTP 500 (Internal Server Error) is
returned.
 
-### <a name="REST_API_LIST_ENTITIES"></a>Query generic entities
+### <a name="REST_API_LIST_ENTITIES_WITH_IN_SCOPE_OF_APPLICATION"></a>Query generic
entities with in the scope of Application
 
 With this API, you can query generic entities identified by cluster ID, application ID and
 per-framework entity type. If the REST endpoint without the cluster name is used, the cluster
@@ -1143,6 +1193,10 @@ If none of the entities match the predicates, an empty list will be
returned.
   or metricstoretrieve is specified. Ignored otherwise. The maximum possible value for metricslimit
can be maximum value of
   Integer. If it is not specified or has a value less than 1, and metrics have to be retrieved,
then metricslimit will be
   considered as 1 i.e. latest single value of metric(s) will be returned.
+1. `metricsTimeStart` - If specified, then metrics for the entity after this timestamp are
returned.
+1. `metricsTimeEnd` - If specified, then metrics for the entity before this timestamp are
returned.
+1. `fromid` -  If specified, retrieve the next set of generic entities from the given fromid.
The set of entities retrieved is inclusive of specified fromid.
+   fromid should be taken from the value associated with FROM_ID info key in flow entity
response which was sent earlier.
 
 #### Example JSON Response:
 
@@ -1157,6 +1211,7 @@ If none of the entities match the predicates, an empty list will be
returned.
         "configs": { },
         "info": {
           "UID": "yarn-cluster!application_1465246237936_0001!YARN_APPLICATION_ATTEMPT!appattempt_1465246237936_0001_000001"
+          "FROM_ID": "yarn-cluster!sjlee!ds-date!1460419580171!application_1465246237936_0001!YARN_APPLICATION_ATTEMPT!0!appattempt_1465246237936_0001_000001"
         },
         "relatesto": { }
       },
@@ -1170,6 +1225,7 @@ If none of the entities match the predicates, an empty list will be
returned.
         "configs": { },
         "info": {
           "UID": "yarn-cluster!application_1465246237936_0001!YARN_APPLICATION_ATTEMPT!appattempt_1465246237936_0001_000002"
+          "FROM_ID": "yarn-cluster!sjlee!ds-date!1460419580171!application_1465246237936_0001!YARN_APPLICATION_ATTEMPT!0!appattempt_1465246237936_0001_000002"
         },
         "relatesto": { }
       }
@@ -1182,7 +1238,142 @@ If none of the entities match the predicates, an empty list will be
returned.
 1. If flow context information cannot be retrieved, HTTP 404 (Not Found) is returned.
 1. For non-recoverable errors while retrieving data, HTTP 500 (Internal Server Error) is
returned.
 
-### <a name="REST_API_LIST_ENTITY"></a>Query generic entity
+### <a name="REST_API_LIST_ENTITIES"></a>Query generic entities.
+
+With this API, you can query generic entities per user identified by cluster ID, doAsUser
and
+entity type. If the REST endpoint without the cluster name is used, the cluster
+specified by the configuration `yarn.resourcemanager.cluster-id` in `yarn-site.xml` is taken.
+If number of matching entities are more than the limit, the most recent
+entities up to the limit will be returned. This endpoint can be used to query generic entity
which
+clients put into the backend. For instance, we can query user entities by specifying entity
type as `TEZ_DAG_ID`.
+If none of the entities match the predicates, an empty list will be returned.
+**Note** : As of today, we can query only those entities which are published with doAsUser
which is different from application owner.
+
+#### HTTP request:
+
+    GET /ws/v2/timeline/clusters/{cluster name}/users/{userid}/entities/{entitytype}
+
+    or
+
+    GET /ws/v2/timeline/users/{userid}/entities/{entitytype}
+
+#### Query Parameters Supported:
+
+1. `limit` - If specified, defines the number of entities to return. The maximum possible
value for limit is maximum value of Long. If it is not specified
+  or has a value less than 0, then limit will be considered as 100.
+1. `createdtimestart` - If specified, then only entities created after this timestamp are
returned.
+1. `createdtimeend` -  If specified, then only entities created before this timestamp are
returned.
+1. `relatesto` - If specified, matched entities must relate to or not relate to given entities
associated with a entity type.
+  relatesto is represented as an expression of the form :<br/>
+  "(&lt;entitytype&gt;:&lt;entityid&gt;:&lt;entityid&gt;...,&lt;entitytype&gt;:&lt;entityid&gt;:&lt;entityid&gt;...)
&lt;op&gt; !(&lt;entitytype&gt;:&lt;entityid&gt;:&lt;entityid&gt;...,&lt;entitytype&gt;:&lt;entityid&gt;:&lt;entityid&gt;...)".<br/>
+  If relatesto expression has entity type - entity id(s) relations specified within enclosing
brackets proceeding "!", this means entities with
+  these relations in its relatesto field, will not be returned. For expressions or subexpressions
without "!", all entities which have the specified
+  relations in its relatesto field, will be returned. "op" is a logical operator and can
be either AND or OR. entity type can be followed by any number
+  of entity id(s). And we can combine any number of ANDs' and ORs' to create complex expressions.
Brackets can be used to club expressions together.<br/>
+  _For example_ : relatesto can be "(((type1:id1:id2:id3,type3:id9) AND !(type2:id7:id8))
OR (type1:id4))".<br/>
+  Please note that URL unsafe characters such as spaces will have to be suitably encoded.
+1. `isrelatedto` - If specified, matched entities must be related to or not related to given
entities associated with a entity type. isrelatedto is
+  represented in the same form as relatesto.
+1. `infofilters` - If specified, matched entities must have exact matches to the given info
key and must be either equal or not equal to
+  given value. The info key is a string but value can be any object. infofilters are represented
as an expression of the form :<br/>
+  "(&lt;key&gt; &lt;compareop&gt; &lt;value&gt;) &lt;op&gt;
(&lt;key&gt; &lt;compareop&gt; &lt;value&gt;)".<br/>
+  Here op can be either of AND or OR. And compareop can be either of "eq", "ne" or "ene".<br/>
+  "eq" means equals, "ne" means not equals and existence of key is not required for a match
and "ene" means not equals but existence of key is
+  required. We can combine any number of ANDs' and ORs' to create complex expressions.  Brackets
can be used to club expressions together.<br/>
+  _For example_ : infofilters can be "(((infokey1 eq value1) AND (infokey2 ne value1)) OR
(infokey1 ene value3))".<br/>
+  Note : If value is an object then value can be given in the form of JSON format without
any space.<br/>
+  _For example_ : infofilters can be (infokey1 eq {"&lt;key&gt;":"&lt;value&gt;","&lt;key&gt;":"&lt;value&gt;"...}).<br/>
+  Please note that URL unsafe characters such as spaces will have to be suitably encoded.
+1. `conffilters` - If specified, matched entities must have exact matches to the given config
name and must be either equal or not equal
+  to the given config value. Both the config name and value must be strings. conffilters
are represented in the same form as infofilters.
+1. `metricfilters` - If specified, matched entities must have exact matches to the given
metric and satisfy the specified relation with the
+  metric value. Metric id must be a string and and metric value must be an integral value.
 metricfilters are represented as an expression of the form :<br/>
+  "(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt;
(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)"<br/>
+  Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt",
"ge", "lt" and "le".<br/>
+  "eq" means equals, "ne" means not equals and existence of metric is not required for a
match, "ene" means not equals but existence of metric is
+  required, "gt" means greater than, "ge" means greater than or equals, "lt" means less than
and "le" means less than or equals. We can combine
+  any number of ANDs' and ORs' to create complex expressions.  Brackets can be used to club
expressions together.<br/>
+  _For example_ : metricfilters can be "(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1
lt 20))".<br/>
+  This in essence is an expression equivalent to "(metric1 == 50 AND metric2 &gt; 40)
OR (metric1 &lt; 20)"<br/>
+  Please note that URL unsafe characters such as spaces will have to be suitably encoded.
+1. `eventfilters` - If specified, matched entities must contain or not contain the given
events depending on the expression. eventfilters is
+  represented as an expression of the form :<br/>
+  "(&lt;eventid&gt;,&lt;eventid&gt;) &lt;op&gt; !(&lt;eventid&gt;,&lt;eventid&gt;,&lt;eventid&gt;)".<br/>
+  Here, "!" means none of the comma-separated list of events within the enclosed brackets
proceeding "!" must exist for a match to occur.
+  If "!" is not specified, the specified events within the enclosed brackets must exist.
op is a logical operator and can be either AND or OR.
+  We can combine any number of ANDs' and ORs' to create complex expressions. Brackets can
be used to club expressions together.<br/>
+  _For example_ : eventfilters can be "(((event1,event2) AND !(event4)) OR (event3,event7,event5))".<br/>
+  Please note that URL unsafe characters such as spaces will have to be suitably encoded.
+1. `metricstoretrieve` - If specified, defines which metrics to retrieve or which ones not
to retrieve and send back in response.
+  metricstoretrieve can be an expression of the form :<br/>
+  (&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;...)<br/>
+  This specifies a comma separated list of metric id prefixes. Only metrics matching any
of the prefixes will be retrieved. Brackets are optional for
+  the simple expression. Alternatively, expressions can be of the form:<br/>
+  !(&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;...)<br/>
+  This specifies a comma separated list of metric id prefixes. Only metrics not matching
any of the prefixes will be retrieved.<br/>
+  If metricstoretrieve is specified, metrics will be retrieved irrespective of whether `METRICS`
is specified in fields query param
+  or not. Please note that URL unsafe characters such as spaces will have to be suitably
encoded.
+1. `confstoretrieve` - If specified, defines which configs to retrieve or which ones not
to retrieve and send back in response.
+  confstoretrieve can be an expression of the form :<br/>
+  (&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;...)<br/>
+  This specifies a comma separated list of config name prefixes. Only configs matching any
of the prefixes will be retrieved. Brackets are optional for
+  the simple expression. Alternatively, expressions can be of the form:<br/>
+  !(&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;...)<br/>
+  This specifies a comma separated list of config name prefixes. Only configs not matching
any of the prefixes will be retrieved.<br/>
+  If confstoretrieve is specified, configs will be retrieved irrespective of whether `CONFIGS`
is specified in fields query param
+  or not. Please note that URL unsafe characters such as spaces will have to be suitably
encoded.
+1. `fields` - Specifies which fields to retrieve. Possible values for fields can be `EVENTS`,
`INFO`, `CONFIGS`, `METRICS`, `RELATES_TO`,
+ `IS_RELATED_TO` and `ALL`. All fields will be retrieved if `ALL` is specified. Multiple
fields can be specified as a comma-separated list.
+  If fields is not specified, in response, entity id, entity type, createdtime and UID in
info field will be returned.
+1. `metricslimit` - If specified, defines the number of metrics to return. Considered only
if fields contains METRICS/ALL
+  or metricstoretrieve is specified. Ignored otherwise. The maximum possible value for metricslimit
can be maximum value of
+  Integer. If it is not specified or has a value less than 1, and metrics have to be retrieved,
then metricslimit will be
+  considered as 1 i.e. latest single value of metric(s) will be returned.
+1. `metricsTimeStart` - If specified, then metrics for the entity after this timestamp are
returned.
+1. `metricsTimeEnd` - If specified, then metrics for the entity before this timestamp are
returned.
+1. `fromid` -  If specified, retrieve the next set of generic entities from the given fromid.
The set of entities retrieved is inclusive of specified fromid.
+   fromid should be taken from the value associated with FROM_ID info key in flow entity
response which was sent earlier.
+
+#### Example JSON Response:
+
+    [
+      {
+        "metrics": [ ],
+        "events": [ ],
+        "type": "TEZ_DAG_ID",
+        "id": "dag_1465246237936_0001_000001",
+        "createdtime": 1465246358873,
+        "isrelatedto": { },
+        "configs": { },
+        "info": {
+          "UID": "yarn-cluster!sjlee!TEZ_DAG_ID!0!dag_1465246237936_0001_000001"
+          "FROM_ID": "sjlee!yarn-cluster!TEZ_DAG_ID!0!dag_1465246237936_0001_000001"
+        },
+        "relatesto": { }
+      },
+      {
+        "metrics": [ ],
+        "events": [ ],
+        "type": "TEZ_DAG_ID",
+        "id": "dag_1465246237936_0001_000002",
+        "createdtime": 1465246359045,
+        "isrelatedto": { },
+        "configs": { },
+        "info": {
+          "UID": "yarn-cluster!sjlee!TEZ_DAG_ID!0!dag_1465246237936_0001_000002!userX"
+          "FROM_ID": "sjlee!yarn-cluster!TEZ_DAG_ID!0!dag_1465246237936_0001_000002!userX"
+        },
+        "relatesto": { }
+      }
+    ]
+
+#### Response Codes
+
+1. If successful, a HTTP 200(OK) response is returned.
+1. If any problem occurs in parsing request, HTTP 400 (Bad Request) is returned.
+1. For non-recoverable errors while retrieving data, HTTP 500 (Internal Server Error) is
returned.
+
+### <a name="REST_API_LIST_ENTITY"></a>Query generic entity with in the scope
of Application
 
 With this API, you can query a specific generic entity identified by cluster ID, application
ID,
 per-framework entity type and entity ID. If the REST endpoint without the cluster name is
used, the
@@ -1236,6 +1427,9 @@ container ID. Similarly, application attempt can be queried by specifying
entity
   or metricstoretrieve is specified. Ignored otherwise. The maximum possible value for metricslimit
can be maximum value of
   Integer. If it is not specified or has a value less than 1, and metrics have to be retrieved,
then metricslimit will be
   considered as 1 i.e. latest single value of metric(s) will be returned.
+1. `metricsTimeStart` - If specified, then metrics for the entity after this timestamp are
returned.
+1. `metricsTimeEnd` - If specified, then metrics for the entity before this timestamp are
returned.
+1. `entityidprefix` Defines the id prefix for the entity to be fetched. If specified, then
entity retrieval will be faster.
 
 #### Example JSON Response:
 
@@ -1248,7 +1442,8 @@ container ID. Similarly, application attempt can be queried by specifying
entity
       "isrelatedto": { },
       "configs": { },
       "info": {
-        "UID": "yarn-cluster!application_1465246237936_0001!YARN_APPLICATION_ATTEMPT!appattempt_1465246237936_0001_000001"
+        "UID": "yarn-cluster!application_1465246237936_0001!YARN_APPLICATION_ATTEMPT!0!appattempt_1465246237936_0001_000001"
+        "FROM_ID": "yarn-cluster!sjlee!ds-date!1460419580171!application_1465246237936_0001!YARN_APPLICATION_ATTEMPT!0!appattempt_1465246237936_0001_000001"
       },
       "relatesto": { }
     }
@@ -1259,3 +1454,117 @@ container ID. Similarly, application attempt can be queried by specifying
entity
 1. If any problem occurs in parsing request, HTTP 400 (Bad Request) is returned.
 1. If flow context information cannot be retrieved or entity for the given entity id cannot
be found, HTTP 404 (Not Found) is returned.
 1. For non-recoverable errors while retrieving data, HTTP 500 (Internal Server Error) is
returned.
+
+### <a name="REST_API_LIST_ENTITIES"></a>Query generic entity.
+
+With this API, you can query generic entity per user identified by cluster ID, doAsUser and
+entity type and entity ID. If the REST endpoint without the cluster name is used, the cluster
+specified by the configuration `yarn.resourcemanager.cluster-id` in `yarn-site.xml` is taken.
+If number of matching entities are more than the limit, the most recent
+entities up to the limit will be returned. This endpoint can be used to query generic entity
which
+clients put into the backend. For instance, we can query user entities by specifying entity
type as `TEZ_DAG_ID`.
+If none of the entities match the predicates, an empty list will be returned.
+**Note** : As of today, we can query only those entities which are published with doAsUser
which is different from application owner.
+
+#### HTTP request:
+
+    GET /ws/v2/timeline/clusters/{cluster name}/users/{userid}/entities/{entitytype}/{entityid}
+
+    or
+
+    GET /ws/v2/timeline/users/{userid}/entities/{entitytype}/{entityid}
+
+#### Query Parameters Supported:
+
+1. `metricstoretrieve` - If specified, defines which metrics to retrieve or which ones not
to retrieve and send back in response.
+  metricstoretrieve can be an expression of the form :<br/>
+  (&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;...)<br/>
+  This specifies a comma separated list of metric id prefixes. Only metrics matching any
of the prefixes will be retrieved. Brackets are optional for
+  the simple expression. Alternatively, expressions can be of the form:<br/>
+  !(&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;,&lt;metricprefix&gt;...)<br/>
+  This specifies a comma separated list of metric id prefixes. Only metrics not matching
any of the prefixes will be retrieved.<br/>
+  If metricstoretrieve is specified, metrics will be retrieved irrespective of whether `METRICS`
is specified in fields query param
+  or not. Please note that URL unsafe characters such as spaces will have to be suitably
encoded.
+1. `confstoretrieve` - If specified, defines which configs to retrieve or which ones not
to retrieve and send back in response.
+  confstoretrieve can be an expression of the form :<br/>
+  (&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;...)<br/>
+  This specifies a comma separated list of config name prefixes. Only configs matching any
of the prefixes will be retrieved. Brackets are optional for
+  the simple expression. Alternatively, expressions can be of the form:<br/>
+  !(&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;,&lt;config\_name\_prefix&gt;...)<br/>
+  This specifies a comma separated list of config name prefixes. Only configs not matching
any of the prefixes will be retrieved.<br/>
+  If confstoretrieve is specified, configs will be retrieved irrespective of whether `CONFIGS`
is specified in fields query param
+  or not. Please note that URL unsafe characters such as spaces will have to be suitably
encoded.
+1. `fields` - Specifies which fields to retrieve. Possible values for fields can be `EVENTS`,
`INFO`, `CONFIGS`, `METRICS`, `RELATES_TO`,
+ `IS_RELATED_TO` and `ALL`. All fields will be retrieved if `ALL` is specified. Multiple
fields can be specified as a comma-separated list.
+  If fields is not specified, in response, entity id, entity type, createdtime and UID in
info field will be returned.
+1. `metricslimit` - If specified, defines the number of metrics to return. Considered only
if fields contains METRICS/ALL
+  or metricstoretrieve is specified. Ignored otherwise. The maximum possible value for metricslimit
can be maximum value of
+  Integer. If it is not specified or has a value less than 1, and metrics have to be retrieved,
then metricslimit will be
+  considered as 1 i.e. latest single value of metric(s) will be returned.
+1. `metricsTimeStart` - If specified, then metrics for the entity after this timestamp are
returned.
+1. `metricsTimeEnd` - If specified, then metrics for the entity before this timestamp are
returned.
+1. `fromid` -  If specified, retrieve the next set of generic entities from the given fromid.
The set of entities retrieved is inclusive of specified fromid.
+   fromid should be taken from the value associated with FROM_ID info key in flow entity
response which was sent earlier.
+
+#### Example JSON Response:
+
+    [
+      {
+        "metrics": [ ],
+        "events": [ ],
+        "type": "TEZ_DAG_ID",
+        "id": "dag_1465246237936_0001_000001",
+        "createdtime": 1465246358873,
+        "isrelatedto": { },
+        "configs": { },
+        "info": {
+          "UID": "yarn-cluster!sjlee!TEZ_DAG_ID!0!dag_1465246237936_0001_000001!userX"
+          "FROM_ID": "sjlee!yarn-cluster!TEZ_DAG_ID!0!dag_1465246237936_0001_000001!userX"
+        },
+        "relatesto": { }
+      }
+    ]
+
+#### Response Codes
+
+1. If successful, a HTTP 200(OK) response is returned.
+1. If any problem occurs in parsing request, HTTP 400 (Bad Request) is returned.
+1. For non-recoverable errors while retrieving data, HTTP 500 (Internal Server Error) is
returned.
+
+### <a name="REST_API_LIST_ENTITY_TYPES"></a>Query generic entity types
+
+With this API, you can query set of available entity types for a given app id. If the REST
endpoint without the cluster name is used, the cluster specified by the configuration yarn.resourcemanager.cluster-id
in yarn-site.xml is taken.  If userid, flow name and flow run id which are optional query
parameters are not specified, they will be queried based on app id and cluster id from the
flow context information stored in underlying storage implementation.
+
+#### HTTP request:
+
+    GET /ws/v2/timeline/apps/{appid}/entity-types
+
+    or
+
+    GET /ws/v2/timeline/clusters/{clusterid}/apps/{appid}/entity-types
+
+#### Query Parameters Supported:
+
+1. `userid` -  If specified, entity must belong to this user. This query param must be specified
along with flowname and flowrunid query params, otherwise it will be ignored.
+  If userid, flowname and flowrunid are not specified then timeline reader will fetch flow
context information based on cluster and appid while executing the query.
+1. `flowname` - If specified, entity must belong to this flow name. This query param must
be specified along with userid and flowrunid query params, otherwise it will be ignored.
+  If userid, flowname and flowrunid are not specified, we would have to fetch flow context
information based on cluster and appid while executing the query.
+1. `flowrunid` - If specified, entity must belong to this flow run id. This query param must
be specified along with userid and flowname query params, otherwise it will be ignored.
+  If userid, flowname and flowrunid are not specified, we would have to fetch flow context
information based on cluster and appid while executing the query.
+
+#### Example JSON Response:
+
+    {
+      YARN_APPLICATION_ATTEMPT,
+      YARN_CONTAINER,
+      MAPREDUCE_JOB,
+      MAPREDUCE_TASK,
+      MAPREDUCE_TASK_ATTEMPT
+    }
+
+#### Response Codes
+
+1. If successful, a HTTP 200 (OK) response is returned.
+1. If any problem occurs in parsing request, HTTP 400 (Bad Request) is returned.
+1. If flow context information cannot be retrieved or entity for the given entity id cannot
be found, HTTP 404 (Not Found) is returned.
+1. For non-recoverable errors while retrieving data, HTTP 500 (Internal Server Error) is
returned.


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


Mime
View raw message