From jia...@apache.org
Subject [77/93] [abbrv] hadoop git commit: YARN-7126. Create introductory site documentation for YARN native services. Contributed by Gour Saha
Date Fri, 13 Oct 2017 18:22:23 GMT
YARN-7126. Create introductory site documentation for YARN native services. Contributed by
Gour Saha

Branch: refs/heads/yarn-native-services
Commit: b773c6fba8b10f44beacf8fa5f3a4cd5e663d4f7
Parents: 6308a8e
Author: Jian He <jianhe@apache.org>
Authored: Fri Sep 1 16:19:31 2017 -0700
Committer: Jian He <jianhe@apache.org>
Committed: Fri Oct 13 11:02:21 2017 -0700

 LICENSE.txt                                     |  1 +
 .../native-services/NativeServicesIntro.md      | 96 +++++++++++++++++++-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/LICENSE.txt b/LICENSE.txt
index 3f50521..46ee108 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
 The binary distribution of this product bundles these dependencies under the
 following license:
 FindBugs-jsr305 3.0.0
+dnsjava 2.1.7, Copyright (c) 1998-2011, Brian Wellington. All rights reserved.
 (2-clause BSD)
 Redistribution and use in source and binary forms, with or without

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/native-services/NativeServicesIntro.md
index 89fefe9..e6a4e91 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/native-services/NativeServicesIntro.md
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/native-services/NativeServicesIntro.md
@@ -10,4 +10,98 @@
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
\ No newline at end of file
+# Introduction: YARN Native Services
+## Overview
+YARN Native Services provides first class framework support and APIs to host long running
services natively in YARN. In addition to launching services, the new APIs support performing
lifecycle management operations, such as flex service components up/down, manage lifetime,
upgrade the service to a newer version, and stop/restart/delete the service.
+The native services capabilities are built on the existing low-level resource management
API provided by YARN that can support any type of application. Other application frameworks
like Hadoop MapReduce already expose higher level APIs that users can leverage to run applications
on top of YARN. With the advent of containerization technologies like Docker, providing first
class support and APIs for long running services at the framework level made sense.
+Relying on a framework has the advantage of exposing a simpler usage model to the user by
enabling service configuration and launch through specification (without writing new code),
as well as hiding complex low-level details including state management and fault-tolerance
etc. Users/operators of existing services typically like to avoid modifying an existing service
to be aware of YARN. With first class support capable of running a single Docker image as
well as complex assemblies comprised of multiple Docker images, there is no need for service
owners to be aware of YARN. Developers of new services do not have to worry about YARN internals
and only need to focus on containerization of their service(s).
+## First class support for services
+In order to natively provide first class support for long running services, several new features
and improvements have been made at the framework level.
+### Incorporate Apache Slider into Apache YARN
+Apache Slider, which existed as a separate incubator project has been merged into YARN to
kick start the first class support. Apache Slider is a universal Application Master (AM) which
had several key features built in - fault tolerance of service containers and AM, work-preserving
AM restarts, service logs management, service management like flex up/down, stop/start, and
rolling upgrade to newer service versions, etc. Of course lot more work has been done on top
of what Apache Slider brought in, details of which follow.
+### Native Services API
+A significant effort has gone into simplifying the user facing story for building services.
In the past, bringing a new service to YARN was not a pleasant experience. The APIs of existing
frameworks are either too low-level (native YARN), require writing new code (for frameworks
with programmatic APIs) or require writing a complex spec (for declarative frameworks).
+The new REST APIs are very simple to use. The REST layer acts as a single point of entry
for creation and lifecycle management of YARN services. Services here can range from simple
single-component apps to the most complex, multi-component applications needing special orchestration
+Plan is to make this a unified REST based entry point for other important features like resource-profile
management ([YARN-3926](https://issues.apache.org/jira/browse/YARN-4793)), package-definitions'
lifecycle-management and service-discovery ([YARN-913](https://issues.apache.org/jira/browse/YARN-913)/[YARN-4757](https://issues.apache.org/jira/browse/YARN-4757)).
+### Native Services Discovery
+The new discovery solution exposes the registry information through a more generic and widely
used mechanism: DNS. Service Discovery via DNS uses the well-known DNS interfaces to browse
the network for services. Having the registry information exposed via DNS simplifies the life
of services.
+The previous read mechanisms of YARN Service Registry were limited to a registry specific
(java) API and a REST interface. In practice, this made it very difficult for wiring up existing
clients and services. For e.g., dynamic configuration of dependent endpoints of a service
was not easy to implement using the registry-read mechanisms, **without** code-changes to
existing services. These are solved by the DNS based service discovery.
+### Scheduling
+[YARN-6592](https://issues.apache.org/jira/browse/YARN-6592) covers a host of scheduling
features that are useful for short-running applications and services alike. Below, are a few
very important YARN core features that help schedule services better. Without these, running
services on YARN is a hassle.
+* Affinity (TBD)
+* Anti-affinity (TBD)
+* Gang scheduling (TBD)
+* Malleable container sizes ([YARN-1197](https://issues.apache.org/jira/browse/YARN-1197))
+### Resource Profiles
+YARN always had support for memory as a resource, inheriting it from Hadoop-(1.x)’s MapReduce
platform. Later support for CPU as a resource ([YARN-2](https://issues.apache.org/jira/browse/YARN-2)/[YARN-3](https://issues.apache.org/jira/browse/YARN-3))
was added. Multiple efforts added support for various other resource-types in YARN such as
disk ([YARN-2139](https://issues.apache.org/jira/browse/YARN-2139)), and network ([YARN-2140](https://issues.apache.org/jira/browse/YARN-2140)),
specifically benefiting long running services.
+In many systems outside of YARN, users are already accustomed to specifying their desired
‘box’ of requirements where each box comes with a predefined amount of each resources.
 Admins would define various available box-sizes (small, medium, large etc) and users would
pick the ones they desire and everybody is happy. In  [YARN-3926](https://issues.apache.org/jira/browse/YARN-3926),
YARN introduces Resource Profiles which extends the YARN resource model for easier resource-type
management and profiles. This helps in two ways - the system can schedule applications better
and it can perform intelligent over-subscription of resources where applicable.
+Resource profiles are all the more important for services since -
+* Similar to short running apps, you don’t have to fiddle with varying resource-requirements
for each container type
+* Services usually end up planning for peak usages, leaving a lot of possibility of barren
+### Special handling of preemption and container reservations
+Preemption and reservation of long running containers have different implications from regular
ones. Preemption of resources in YARN today works by killing of containers. For long-lived
services this is unacceptable. Also, scheduler should avoid allocating long running containers
on borrowed resources. [YARN-4724](https://issues.apache.org/jira/browse/YARN-4724) will address
some of these special recognition of service containers.
+### Container auto-restarts
+If a service container dies, expiring container's allocation and releasing the allocation
is undesirable in many cases. Long running containers may exit for various reasons, crash
and need to restart but forcing them to go through the complete scheduling cycle, resource
localization, etc. is both unnecessary and expensive.
+Services can enable app-specific policies to prevent NodeManagers to automatically restart
containers. [YARN-3998](https://issues.apache.org/jira/browse/YARN-3998) implements a  retry-policy
to let NM re-launch a service container when it fails.
+### Container allocation re-use for application upgrades
+Auto-restart of containers will support upgrade of service containers without reclaiming
the resources first. During an upgrade, with multitude of other applications running in the
system, giving up and getting back resources allocated to the service is hard to manage. Node-Labels
help this cause but are not straight-forward to use to address the app-specific use-cases.
The umbrella [YARN-4726](https://issues.apache.org/jira/browse/YARN-4726) along with [YARN-5620](https://issues.apache.org/jira/browse/YARN-5620)
and [YARN-4470](https://issues.apache.org/jira/browse/YARN-4470) will take care of this.
+### Dynamic Configurations
+Most production-level services require dynamic configurations to manage and simplify their
lifecycle. Container’s resource size, local/work dirs and log-dirs are the most basic information
services need. Service's endpoint details (host/port), their inter-component dependencies,
health-check endpoints, etc. are all critical to the success of today's real-life services.
+### Resource re-localization for reconfiguration/upgrades
+### Service Registry
+### Service persistent storage and volume support
+### Packaging
+### Container image registry (private, public and hybrid)
+### Container image management and APIs
+### Container image storage
+### Monitoring
+### Metrics
+### Service Logs

