Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2DF00200B2A for ; Fri, 10 Jun 2016 23:17:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2CA0F160A15; Fri, 10 Jun 2016 21:17:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7DCEE160A38 for ; Fri, 10 Jun 2016 23:17:22 +0200 (CEST) Received: (qmail 50185 invoked by uid 500); 10 Jun 2016 21:17:21 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 50167 invoked by uid 99); 10 Jun 2016 21:17:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jun 2016 21:17:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 2BCD42C1F6D for ; Fri, 10 Jun 2016 21:17:21 +0000 (UTC) Date: Fri, 10 Jun 2016 21:17:21 +0000 (UTC) From: "Sangjin Lee (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 10 Jun 2016 21:17:23 -0000 [ https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325298#comment-15325298 ] Sangjin Lee commented on YARN-4958: ----------------------------------- {quote} Strictly speaking this should be two separate JIRAs, but I don't think anyone is that fussy about it. I've seen plenty of patches that touch more than one project. I've submitted several myself that touched common and HDFS. {quote} OK, although it'd be ideal to have two JIRAs (YARN to enable support for wildcards in container launch context and MAPREDUCE to take advantage of it), it might be good to move it to MAPREDUCE at least. The majority of the changes are really in MAPREDUCE. What do you think? > The file localization process should allow for wildcards to reduce the application footprint in the state store > --------------------------------------------------------------------------------------------------------------- > > Key: YARN-4958 > URL: https://issues.apache.org/jira/browse/YARN-4958 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.8.0 > Reporter: Daniel Templeton > Assignee: Daniel Templeton > Priority: Critical > Attachments: YARN-4958.001.patch, YARN-4958.002.patch, YARN-4958.003.patch > > > When using the -libjars option to add classes to the classpath, every library so added is explicitly listed in the {{ContainerLaunchContext}}'s local resources even though they're all uploaded to the same directory in HDFS. When using tools like Crunch without an uber JAR or when trying to take advantage of the shared cache, the number of libraries can be quite large. We've seen many cases where we had to turn down the max number of applications to prevent ZK from running out of heap because of the size of the state store entries. > Rather than listing all files independently, this JIRA proposes to have the NM allow wildcards in the resource localization paths. Specifically, we propose to allow a path to have a final component (name) set to "*", which is interpreted by the NM as "download the full directory and link to every file in it from the job's working directory." This behavior is the same as the current behavior when using -libjars, but avoids explicitly listing every file. > This JIRA does not attempt to provide more general purpose wildcards, such as "\*.jar" or "file\*", as having multiple entries for a single directory presents numerous logistical issues. > This JIRA also does not attempt to integrate with the shared cache. That work will be left to a future JIRA. Specifically, this JIRA only applies when a full directory is uploaded. Currently the shared cache does not handle directory uploads. > This JIRA proposes to allow for wildcards both in the internal processing of the -libjars switch and in paths added through the {{Job}} and {{DistributedCache}} classes. > The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of all file verification and localization. In the final step, the NM will query the localized directory to get a list of the files in "dir" such that each can be linked from the job's working directory. Since $PWD/\* is always included on the classpath, all JAR files in "dir" will be in the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org