hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3772) MultipleOutputs output lost if baseOutputPath starts with ../
Date Tue, 27 Nov 2012 00:14:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504255#comment-13504255
] 

Alejandro Abdelnur commented on MAPREDUCE-3772:
-----------------------------------------------

Priyo,

MultipleOutputs has been designed assuming that named outputs are written in the output-dir
of the job. This enables speculative execution to work. While a task is writing its output,
this is done in a task's temporary directory within the output directory, when the task is
completed, the files from the tasks's temporary directory are promoted to the output-dir.
This includes all named outputs. Even if speculative execution is ON, there is no conflict/overriding
because the 2 competing speculative tasks are using 2 different temporary directories and
only one will be promoted.

The assumption for this to works is that named outputs are NAMEs, not paths. Then, they end
up next to the default 'part-####' output files.

Please verify this is the case, and if so, please close this JIRA as invalid.



                
> MultipleOutputs output lost if baseOutputPath starts with ../
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-3772
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3772
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.203.0, 0.22.0
>         Environment: FreeBSD
>            Reporter: Radim Kolar
>
> Lets say you have output directory set:
> FileOutputFormat.setOutputPath(job, "/tmp/multi1/out");
> and want to place output from MultipleOutputs into /tmp/multi1/extra
> I expect following code to work:
> mos = new MultipleOutputs<Text, IntWritable>(context);
> mos.write(new Text("zrr"), value, "../extra/");
> but no Exception is throw and expected output directory /tmp/multi1/extra does not even
exists. All data written to this output vanish without trace.
> To make it work fullpath must be used
> mos.write(new Text("zrr"), value, "/tmp/multi1/extra/");
> Output is listed in statistics from MultipleOutputs correctly:
>         org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
>                 ../gaja1/=13333 (* everything is lost *)
>                 /tmp/multi1/out/../ksd34/=13333 (* this using full path works *)
>                 list1=6667

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message