hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: passing configuration parameter to comparator
Date Mon, 02 Dec 2013 17:04:28 GMT
The comparators are also initialised via the ReflectionUtils code, so
they do try to pass configuration onto the instantiated object if the
class implements the org.apache.hadoop.conf.Configurable interface or
extends the org.apache.hadoop.conf.Configured class (which implements
the interface for you). This will let you access the configuration.

As to the order, the grouping comparator is used only on the reducer
side (see also [1]) and is therefore invoked before the first reduce()
is run. As to the KeyComparator for intermediate data (and combiners),
yes it is initialised before the first map() call.

[1] - A combiner, which runs on the map side, has so far never invoked
the GroupingComparator class, hence the statement that a mapper may
never invoke it. However,
https://issues.apache.org/jira/browse/MAPREDUCE-3310 may alter this
current behaviour (if a combiner is explicitly involved).

On Mon, Dec 2, 2013 at 7:28 PM, Sergey Gerasimov
<gerasimov@mlab.cs.msu.su> wrote:
> Hello,
> What is the best way to pass job configuration parameter to class like
> GroupingComparator which is instantiated by hadoop. I know there is setup
> method in map class and probably I can initialize some static variable in
> setup and use it in GroupingComparator, not sure that is correct (not sure
> there is guarantee that GroupingComparator will be instantiated after first
> call of map on this node) But what is preferred pattern for the case? Maybe
> there is some unified way to access job config from anywhere?
> Thanks!
> Sergey.

Harsh J

View raw message