hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart White <stuart.whi...@gmail.com>
Subject Multiple outputs and getmerge?
Date Mon, 20 Apr 2009 20:14:49 GMT
I've written a MR job with multiple outputs.  The "normal" output goes
to files named part-XXXXX and my secondary output records go to files
I've chosen to name "ExceptionDocuments" (and therefore are named
"ExceptionDocuments-m-XXXXX").

I'd like to pull merged copies of these files to my local filesystem
(two separate merged files, one containing the "normal" output and one
containing the ExceptionDocuments output).  But, since hadoop lands
both of these outputs to files residing in the same directory, when I
issue "hadoop dfs -getmerge", what I get is a file that contains both
outputs.

To get around this, I have to move files around on HDFS so that my
different outputs are in different directories.

Is this the best/only way to deal with this?  It would be better if
hadoop offered the option of writing different outputs to different
output directories, or if getmerge offered the ability to specify a
file prefix for files desired to be merged.

Thanks!

Mime
View raw message