hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Bertin <frederic.ber...@anyware-tech.com>
Subject Re: Number of Reduce Outputs
Date Tue, 29 Aug 2006 17:53:39 GMT
I asked me the same question when I stepped into Hadoop, and I think 
it's a good candidate for FAQ ;)

Generally speaking, IMO there is a need in Hadoop (MapReduce part) for 
some kind of JobListener interface, allowing to write custom callbacks 
called at strategic moments of a Job's life, and executed on a single 
machine.
Dennis's problem could then be solved using a MergeOutputFilesListener.

This could also allow to do more complex things like notifying people of 
jobs' results by mail, etc... but this kind of example may be outside 
Hadoop's scope. However just publishing the listener interface would 
contribute to make Hadoop more pluggable, and allow people to contribute 
useful extensions, even if they are not focused on Hadoop's core.

WDYT?

Fred


Doug Cutting wrote:
> To generate a single output file, specify just a single reduce task.  
> If your reducer isn't doing much computation, then it might be faster 
> to do this in the original job, otherwise use a subsequent job.
>
> Doug
>
> Dennis Kubes wrote:
>> This is probably a simple question but when I run my MR job I am 
>> getting 10 splits and therefore 10 output files like part-xxxxx.  Is 
>> there a way to merge those outputs into a single file using the 
>> currently running MR job or do I need to run another MR job to merge 
>> them?
>>
>> Dennis Kubes


Mime
View raw message