hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "W.P. McNeill" <bill...@gmail.com>
Subject What is currently the best way to write to multiple output locations in Hadoop?
Date Mon, 12 Mar 2012 20:28:47 GMT
I have an algorithm that runs multiple iterations of a Hadoop job. Each
iteration produces two kinds of output: stuff that is "done" and gets
written out to the side and stuff that is "not-done" and gets fed back into
the next iteration. The reducer makes this distinction. The algorithm
completes when an iteration has no "not-done" output.

Basically what I need is two different output channels for my reducer. What
is currently the best way to do this in Hadoop. I know the old API had a
MultipleOutputs class, but I think that's deprecated now. I have been
creating and populating the "done" sequence files directly, but I rather
have the Hadoop framework do this for me to save on work and avoid name
collisions that I haven't anticipated.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message