hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject aborting reducer
Date Wed, 16 Apr 2008 19:07:07 GMT
I have a job that out of a list with object finds the one with least 
distance to a given test object. All my reducer does is to collect the 
first result and ignore the rest.

 > private boolean processed = false;
 > public void reduce(DoubleWritable distance, Iterator<Long> keys,
 >              OutputCollector<DoubleWritable, LongWritable> output,
 >              Reporter reporter)
 >    throws IOException {
 >   if (processed) {
 >     return;
 >   }
 >   collector.collect(distance, keys.next());
 > }

I'm not sure if I do something fundamentally wrong or designing the 
mapper and the reducer or if I came up with a new use case, but it feels 
very inefficient to iterate through all those records and deserialize 
them just to ignore the value. Went looking in the code base to see if 
it was possible to abort the reduction/combintion iteration and found 
that a simple enough solution would be to throw some exception (or have 
reduce return a boolean).


     karl

Mime
View raw message