hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vanja Komadinovic <vanja...@gmail.com>
Subject MultipleOutputs support
Date Mon, 01 Aug 2011 22:21:08 GMT
Hi all,

I'm trying to create M/R tasks that will output more than one "type" of data. Ideal thing
would be MultipleOutputs feature of Map Reduce, but in our current production version, CDH3
( 0.20.2 ), this support is broken. 

So, I tried to simulate MultipleOutputs. In Reducer setup I'm opening hdfs output stream,
during reduce calls writing to stream and in close call closing stream. Output streams are
named with attempt id inside. This is working great. Speculative execution is disabled, but
sometimes one of reduce task fail, and I' getting two files for reducer on same data. Is there
any way to find out which task attempts where successful, so I can delete unneeded data after
successful job? I'm using new MapReduce API. Or some other better idea to achieve this?


Komadinovic Vanja
+381 (64) 296 03 43

View raw message