Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 780631784C for ; Mon, 26 Jan 2015 20:25:36 +0000 (UTC) Received: (qmail 77724 invoked by uid 500); 26 Jan 2015 20:25:36 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 77687 invoked by uid 500); 26 Jan 2015 20:25:36 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 77675 invoked by uid 99); 26 Jan 2015 20:25:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2015 20:25:36 +0000 Date: Mon, 26 Jan 2015 20:25:36 +0000 (UTC) From: "Chris Nauroth (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-11509) change parsing sequence in GenericOptionsParser to parse -D parameters first MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-11509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292357#comment-14292357 ] Chris Nauroth commented on HADOOP-11509: ---------------------------------------- Thank you, Xuan and Jian. Just to provide a bit more background on this, Xuan found that streaming jobs using files in Azure Storage were not able to override the setting of {{fs.azure.block.size}} from the command line. It looks like he found the root cause is that {{validateFiles}} checks for existence of files against a {{FileSystem}} instance, but this {{FileSystem}} instance is obtained before handling -D options. This would mean we then have an instance sitting in the {{FileSystem}} cache that was created without the -D options set in the {{Configuration}}. Later, during MapReduce job split calculation, it would use the cached instance that didn't have the override of {{fs.azure.block.size}}. I agree with the change here, because the expectation is that the command line arguments take precedence. However, I don't think we should move the -D handling all the way to the top of the method. Right now, the handling is such that -D options would take precedence over -fs and -jt. The current patch would reverse that. I don't know if anyone depends on that behavior, but we can avoid changing it by doing the -D handling in between the handling of -conf and the handling of -libjars. I'd be +1 for the patch with that change if you test it and it still works for overriding {{fs.azure.block.size}}. bq. Should the API Path.getFileSystem(Configuration conf) be that the returned file system object always apply the up-to-date conf ? This is a long-standing weakness of the {{FileSystem}} cache. It has been discussed in other jiras, but I can't find those now. The {{FileSystem}} cache key is composed of scheme, authority, and {{UserGroupInformation}}. However, the {{FileSystem#get}} API is phrased in terms of a whole {{Configuration}}. Various other configuration properties can tune the behavior of a {{FileSystem}}, but if you get a cached instance, then these configuration properties might not be applied. OTOH, it would be too costly to make the whole {{Configuration}} part of the cache key. This is an existing problem, unrelated to the current patch. > change parsing sequence in GenericOptionsParser to parse -D parameters first > ---------------------------------------------------------------------------- > > Key: HADOOP-11509 > URL: https://issues.apache.org/jira/browse/HADOOP-11509 > Project: Hadoop Common > Issue Type: Bug > Reporter: Xuan Gong > Assignee: Xuan Gong > Attachments: HADOOP-11509.1.patch > > > In GenericOptionsParser, we need to parse -D parameter first. In that case, the user input parameter (through -D) can be set into configuration object earlier and used to process other parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)