hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1956) allow reducer to initialize lazily
Date Wed, 21 Jul 2010 17:39:52 GMT
allow reducer to initialize lazily

                 Key: MAPREDUCE-1956
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1956
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tasktracker
    Affects Versions: 0.20.2
            Reporter: Ted Yu

>From http://www.scribd.com/doc/23046928/Hadoop-Performance-Tuning:
"In M/R job Reducers are initialized with Mappers at the job initialization, but the reduce
method is called in reduce phase when all the maps had been finished. So in large jobs where
Reducer loads data (>100 MB for business logic) in-memory on initialization, the performance
can be increased by lazily initializing Reducers i.e. loading data in reduce method controlled
by an initialize flag variable which assures that it is loaded only once. By lazily initializing
Reducers which require memory (for business logic) on initialization, number of maps can be

Introducing a parameter for this purpose would allow more people to utilize the above pattern.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message