hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman" <jeast...@collab.net>
Subject RE: Best Practice?
Date Mon, 11 Feb 2008 20:24:50 GMT
Hi Owen,

Thanks for the information. I took Ted's advice and refactored my mapper
so as to use a combiner and that solved my front-end canopy generation
problem, but I still have to output the final canopies in the reducer
during close() since there is no similar combiner mechanism. I was
worried about this, but now I won't.


-----Original Message-----
From: Owen O'Malley [mailto:oom@yahoo-inc.com] 
Sent: Monday, February 11, 2008 10:40 AM
To: core-user@hadoop.apache.org
Subject: Re: Best Practice?

On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote:

> I'm trying to wait until close() to output the cluster centroids to  
> the
> reducer, but the OutputCollector is not available.

You hit on exactly the right solution. Actually, because of Pipes and  
Streaming, you have a lot more guarantees than you would expect. In  
particular, you can call output.collect when the framework is between  
calls to map or reduce up until the close finishes.

-- Owen

View raw message