Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 638D7E301 for ; Thu, 29 Nov 2012 01:09:58 +0000 (UTC) Received: (qmail 79710 invoked by uid 500); 29 Nov 2012 01:09:58 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 79669 invoked by uid 500); 29 Nov 2012 01:09:58 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 79659 invoked by uid 99); 29 Nov 2012 01:09:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2012 01:09:58 +0000 Date: Thu, 29 Nov 2012 01:09:58 +0000 (UTC) From: "Priyo Mustafi (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1007090168.36994.1354151398113.JavaMail.jiratomcat@arcas> In-Reply-To: <1229262622.11424.1328014090890.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-3772) MultipleOutputs output lost if baseOutputPath starts with ../ MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506107#comment-13506107 ] Priyo Mustafi commented on MAPREDUCE-3772: ------------------------------------------ MultipleOutputs exposes to methods. 1) public void write(String namedOutput,K key,V value) 2) public void write(String namedOutput,K key,V value,String baseOutputPath) where namedOutput - the named output name baseOutputPath - base-output path to write the record to. Note: Framework will generate unique filename for the baseOutputPath We use the second one which allows you to provide a baseOutputPath where the data needs to be written. I don't see anywhere in the javadoc which mentions that baseOutputPath shouldn't be a fully qualified path. So the Jira is definitely valid. Either the Javadoc needs to be fixed or the code needs to be fixed and I would prefer the latter as we have developed extensive data-pipelines based on this. If it is not fixed, we have to change the absolute paths to sub-directory paths and then once the job is done, move all those directories out to the expected locations. Aside that, if we provide baseOutputPath as "abc/def/xyz" then it puts the directory under the main output directory i.e. you get files like this /abc/def/xyz-r-00000. Instead if you use baseOutputPath as "/abc/def/xyz" where the path isn't a subdirectory of the main output directory, then the problem is seen. > MultipleOutputs output lost if baseOutputPath starts with ../ > ------------------------------------------------------------- > > Key: MAPREDUCE-3772 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 > Affects Versions: 0.20.203.0, 0.22.0 > Environment: FreeBSD > Reporter: Radim Kolar > > Lets say you have output directory set: > FileOutputFormat.setOutputPath(job, "/tmp/multi1/out"); > and want to place output from MultipleOutputs into /tmp/multi1/extra > I expect following code to work: > mos = new MultipleOutputs(context); > mos.write(new Text("zrr"), value, "../extra/"); > but no Exception is throw and expected output directory /tmp/multi1/extra does not even exists. All data written to this output vanish without trace. > To make it work fullpath must be used > mos.write(new Text("zrr"), value, "/tmp/multi1/extra/"); > Output is listed in statistics from MultipleOutputs correctly: > org.apache.hadoop.mapreduce.lib.output.MultipleOutputs > ../gaja1/=13333 (* everything is lost *) > /tmp/multi1/out/../ksd34/=13333 (* this using full path works *) > list1=6667 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira