hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
Date Mon, 07 Jul 2008 08:01:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610905#action_12610905
] 

tucu00 edited comment on HADOOP-3702 at 7/7/08 12:59 AM:
---------------------------------------------------------------------

This could be done with ChainMapper and ChainReducer classes that would manage the chain of
Maps and they would override the OutputCollector to implement the chaining.

The Maps and Reduce that are part of the Chain are unware they are executed in a Chain, they
receive records via the {{map}} and {{reduce}} methods and do the output via the {{OutputCollector}}.

The API would look something like:

{code:java}

public class ChainMapper implements Mapper {

  public static void addMapper(JobConf job, Class<? extends Mapper> klass, Properties
mapperConf);
  ...
}

public class ChainReducer implements Reducer {

  public static void setReducer(JobConf job, Class<? extends Reducer> klass, Properties
reducerConf);

  public static void addMapper(JobConf job, Class<? extends Mapper> klass, Properties
mapperConf);
  ...
}

{code}

The {{Properties}} configuration passed to the {{Mapper}} and {{Reducer}} when setting them
into the chain are injected into a copy of the job's configuration. This allows maps to be
configured as usual without being aware that they are in a chain.

Example of creating and submitting a chain job:

{code:java}

JobConf conf = new JobConf();

// chaining maps in the Map phase

Properties mapAConf = new Properties();
mapAConf.setProperty("a", "A");
ChainMapper.addMapper(conf, AMap.class, mapAConf);

ChainMapper.addMapper(conf, BMap.class, null);

// setting the reducer

Properties reduceConf = new Properties();
ChainReducer.setReducer(conf, XReduce.class, reduceConf);

// chaining maps in the Reduce phase

ChainReducer.addMapper(conf, CMap.class, null);

ChainReducer.addMapper(conf, DMap.class, null);

...

FileInputFormat.setInputPaths(conf, inDir);
FileOutputFormat.setOutputPath(conf, outDir);

JobClient jc = new JobClient(conf);
RunningJob job = jc.submitJob(conf);

{code}

      was (Author: tucu00):
    Example of creating and submitting a chain job:

{code:java}

JobConf conf = new JobConf();

// chaining maps in the Map phase

Properties mapAConf = new Properties();
mapAConf.setProperty("a", "A");
ChainMapper.addMapper(conf, AMap.class, mapAConf);

ChainMapper.addMapper(conf, BMap.class, null);

// setting the reducer

Properties reduceConf = new Properties();
ChainReducer.setReducer(conf, XReduce.class, reduceConf);

// chaining maps in the Reduce phase

ChainReducer.addMapper(conf, CMap.class, null);

ChainReducer.addMapper(conf, DMap.class, null);

...

FileInputFormat.setInputPaths(conf, inDir);
FileOutputFormat.setOutputPath(conf, outDir);

JobClient jc = new JobClient(conf);
RunningJob job = jc.submitJob(conf);

{code}
  
> add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3702
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Minor
>
> On the same input, we usually need to run multiple Maps one after the other without no
Reduce. We also have to run multiple Maps after the Reduce.
> If all pre-Reduce Maps are chained together and run as a single Map a significant amount
of Disk I/O will be avoided. 
> Similarly all post-Reduce Maps can be chained together and run in the Reduce phase after
the Reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message