hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-5683) Support specifying storage type for per-application local dirs
Date Wed, 28 Sep 2016 11:12:20 GMT
Tao Yang created YARN-5683:

             Summary: Support specifying storage type for per-application local dirs
                 Key: YARN-5683
                 URL: https://issues.apache.org/jira/browse/YARN-5683
             Project: Hadoop YARN
          Issue Type: New Feature
          Components: nodemanager
    Affects Versions: 3.0.0-alpha2
            Reporter: Tao Yang
             Fix For: 3.0.0-alpha2

# Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) using local storage
(checkpoint, shuffle etc) might require high IO performance. It's useful to allocate local
directories to high performance storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications cannot selectively
use storage media with different performance characteristics. Adding awareness of storage
media can allow YARN to make better decisions about the placement of local/log directories
with input from applications. An application can choose the desired storage media by configuration
based on its performance and requirements.

# Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration should allow the
cluster administrator to optionally specify the storage type for each local directories. Example:
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default storage type.

** StorageLocation separates storage type and directory path, used by LocalDirAllocator to
aware the types of local dirs, the default storage type is DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the local directory
of the specified storage type, and will fallback to not care storage type if the requirement
can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All application
frameworks can set the environment variables (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to
specified the desired storage type of local/log directories.
* Allow specified storage type for various frameworks (Take MapReduce as an example)
** Add new configurations should allow application administrator to optionally specify the
storage type of local/log directories. (MapReduce add configurations: mapreduce.job.local-storage-type
and mapreduce.job.log-storage-type)
** Support for container work directories. Set the environment variables includes LOCAL_STORAGE_TYPE
and LOG_STORAGE_TYPE according to configurations above for ContainerLaunchContext and ApplicationSubmissionContext.
(MapReduce should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local directories of frameworks
(such as shuffle directories for MapReduce). (MapReduce should update YarnOutputFiles, MROutputFiles
and YarnChild to support for output/work directories)

# Further Discussion
* The requirement of storage type for local/log directories may not be satisfied on heterogeneous
clusters. To achieve global optimum, scheduler should aware and management disk resources
to. [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that but seems
not support multiple storage types, maybe we should do even more to aware the storage type
of disk resource?
* Node labels or node constraints can also make a higher chance to satisfy the requirement
of specified storage type.
* Fallback strategy still needs to be concerned. Certain applications might not work well
when the requirement of storage type is not satisfied. When none of desired storage type disk
are available, should container launching be failed? let AM handle?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message