hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radim Kolar <...@filez.com>
Subject Re: When speculative execution is true, there is a data loss issue with multpleoutputs
Date Wed, 21 Nov 2012 15:31:54 GMT
Dne 21.11.2012 16:07, AnilKumar B napsal(a):
> Thanks Radim.
> Yes, as you said we are not writing into sub-directory of main job. I 
> will try by making them as sub-directories of output dir.
> But one question, when I turn of speculative execution then it is 
> working fine with same multiple output directory structure. May I 
> know, how exactly it working in this case?
> When we change the speculative execution flag, why exactly there is a 
> difference in output data?
because if you are not using multipleoutput then you are not writing to 
real file, but to file with name generated from its task attempt in tmp 
subdirectory. They do not overwrite each other. In HDFS you can have 
only one writer per file.

View raw message