hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Transfer archives (or any file) from Mapper to Reducer?
Date Mon, 21 May 2012 09:02:19 GMT

I guess you could write these archives onto HDFS, and have your
reducers read it from a location there, but this method may be a bit
ugly. See http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
for properly writing files from tasks onto a DFS, or look at
MultipleOutputs API class.

Depending on how large these files are, you can also perhaps ship them
in via the KV pairs itself. A custom key or sort comparator can
further ensure that they are delivered in the first iteration of the
reducer - if the file is required before regular reduce() ops can

On Mon, May 21, 2012 at 1:42 PM, biro lehel <lehel.biro@yahoo.com> wrote:
> Dear all,
> In my Mapper, I run a script that processes my set of input text files, creates from
them some other text files (this is done locally on the FS on my nodes), and as a result,
each MapTask will produce an archive as a result. My issue is, that I'm looking for a way
for the Reducer to "take" these archives as some kind of an input. I understood that the communication
between Mapper-Reducer is done through the means of the key-value pairs in the Context, but
what I would need is the transferring of these archive files to the respective Reducer (I
would probably have one single Reducer, so all the files should be transferred/copied there
> Is this possible? Is there a way to transfer files from Mapper to Reducer? If not, what
is the best approach in scenarios like mine? Any suggestions would be greatly appreciated.
> Thank you in advance,
> Lehel.

Harsh J

View raw message