From yarn-issues-return-158240-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Sat Nov 17 04:31:11 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C1DD1180679 for ; Sat, 17 Nov 2018 04:31:10 +0100 (CET) Received: (qmail 33907 invoked by uid 500); 17 Nov 2018 03:31:09 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 33629 invoked by uid 99); 17 Nov 2018 03:31:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Nov 2018 03:31:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 59C6EC2289 for ; Sat, 17 Nov 2018 03:31:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id WVmda_B3XiSF for ; Sat, 17 Nov 2018 03:31:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 53CD25FDFD for ; Sat, 17 Nov 2018 03:31:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 63924E2673 for ; Sat, 17 Nov 2018 03:31:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5EB0023FDC for ; Sat, 17 Nov 2018 03:31:03 +0000 (UTC) Date: Sat, 17 Nov 2018 03:31:03 +0000 (UTC) From: "Hadoop QA (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5683) Support specifying storage type for per-application local dirs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690339#comment-16690339 ] Hadoop QA commented on YARN-5683: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-5683 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-5683 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12832871/YARN-5683-3.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22588/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Support specifying storage type for per-application local dirs > -------------------------------------------------------------- > > Key: YARN-5683 > URL: https://issues.apache.org/jira/browse/YARN-5683 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Affects Versions: 3.0.0-alpha2 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Major > Labels: oct16-hard > Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png > > > h3. Introduction > * Some applications of various frameworks (Flink, Spark and MapReduce etc) using local storage (checkpoint, shuffle etc) might require high IO performance. It's useful to allocate local directories to high performance storage media for these applications on heterogeneous clusters. > * YARN does not distinguish different storage types and hence applications cannot selectively use storage media with different performance characteristics. Adding awareness of storage media can allow YARN to make better decisions about the placement of local directories. > h3. Approach > * NodeManager will distinguish storage types for local directories. > ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration should allow the cluster administrator to optionally specify the storage type for each local directories. Example: [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir) > ** StorageType defines DISK/SSD storage types and takes DISK as the default storage type. > ** StorageLocation separates storage type and directory path, used by LocalDirAllocator to aware the types of local dirs, the default storage type is DISK. > ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the local directory of the specified storage type, and will fallback to not care storage type if the requirement can not be satisfied. > ** Support for container related local/log directories by ContainerLaunch. All application frameworks can set the environment variables (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage type of local/log directories, and choose to not launch container if fallback through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and ENSURE_LOG_STORAGE_TYPE). > * Allow specified storage type for various frameworks (Take MapReduce as an example) > ** Add new configurations should allow application administrator to optionally specify the storage type of local/log directories and fallback strategy (MapReduce configurations: mapreduce.job.local-storage-type, mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and mapreduce.job.ensure-log-storage-type). > ** Support for container work directories. Set the environment variables includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce should update YARNRunner and TaskAttemptImpl) > ** Add storage type prefix for request path to support for other local directories of frameworks (such as shuffle directories for MapReduce). (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to support for output/work directories) > ** Flow diagram for MapReduce framework > !flow_diagram_for_MapReduce-2.png! > h3. Further Discussion > * Scheduling : The requirement of storage type for local/log directories may not be satisfied for a part of nodes on heterogeneous clusters. To achieve global optimum, scheduler should aware and manage disk resources. > ** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate containers which have SSD requirement on nodes with attribute:ssd=true. > ** Approach-2: Based on extended resource model (YARN-3926), it's easy to support scheduling through extending resource models like vdisk and vssd using this feature, but hard to measure for applications and isolate for non-CFQ based disks. > * Fallback strategy still needs to be concerned. Certain applications might not work well when the requirement of storage type is not satisfied. When none of desired storage type disk are available, should container launching be failed? let AM handle? We have implemented a fallback strategy that fail to launch container when none of desired storage type disk are available. Is there some better methods? > This feature has been used for half a year to meet the needs of some applications on Alibaba search clusters. > Please feel free to give your suggestions and opinions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org