hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Skip Reduce Phase
Date Thu, 07 Feb 2008 17:59:07 GMT

I think that setting the parameter to 0 skips most of the overhead of the
later stages.

Also, if you REALLY want to lower overhead, you can write a "meta-mapper"
class that hooks together a list of mappers using a purpose built output
collector.  

That will avoid the disk storage overhead completely.


On 2/7/08 9:54 AM, "David Alves" <dr-alves@criticalsoftware.com> wrote:

> Hi Ted
> 
> But wouldn't that still go through the intermediate phases and do the
> merge sort and copy to the local filesystem (which is the reduce input)?
> 
> Is there a way to provide the direct map output (saved onto DFS) to
> another map task, or does you suggestion already do this and this is a
> moot point?.
> 
> David
> 
> On Thu, 2008-02-07 at 09:39 -0800, Ted Dunning wrote:
>> Set numReducers to 0.
>> 
>> 
>> On 2/7/08 9:35 AM, "David Alves" <dr-alves@criticalsoftware.com> wrote:
>> 
>>> Hi All
>>> First of all since this is my first post I must say congrats for the
>>> great piece of software (both Hadoop and HBase).
>>> I've been using Hadoop&HBase for a while and I have a question, let me
>>> just explain a little my setup:
>>> 
>>> I have an HBase Database that holds information that I want to process
>>> in a Map/Reduce job but that before needs to be a little processed.
>>> 
>>> So I built another Map/Reduce Job that uses a Specific (Filtered)
>>> TableInputFormat and then pre processes the information in a Map phase.
>>> 
>>> As I don't need none of the intermediate phases (like merge sort) and I
>>> don't need to do anything on the reduce phase I was wondering If I could
>>> just save the Map phase output and start the second Map/Reduce job using
>>> that as an input (but still saving the splits to DFS for
>>> backtracking/reliability reasons).
>>> 
>>> Is this possible?
>>> 
>>> Thanks in advance, and again great piece of software.
>>> David Alves
>>> 
>>> 
>>> 
>> 
> 


Mime
View raw message