hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Baff <Aaron.B...@telescope.tv>
Subject Make reducer task exit early
Date Sat, 04 Jun 2011 00:14:23 GMT
Is there a way to make a Reduce task exit early before it has finished reading all of it's
data? Basically I'm doing a group by with a sum, and I only want to return the top 1000 records
say. So I have local class int variable to keep track of how many have current been written
to the output, and as soon as that is exceeded, simply return at the top of the reduce() function.

Is there any way to optimize it even more to tell the Reduce task, "stop reading data, I don't
need any more data"?

--Aaron

Mime
View raw message