hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-841) Protect Job Tracker against memory exhaustion due to very large InputSplit or JobConf objects
Date Sat, 08 Aug 2009 09:11:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740858#action_12740858

Hong Tang commented on MAPREDUCE-841:

For JobConf, it becomes a bit hard to determine the subset of properties used by JobTracker.
I scanned through JobTracker.java, and here is the list so far:
- "hadoop.job.ugi": user/group info.
- "job.end.retry.attempts" / "job.end.retry.interval": for job end notification
- "mapred.job.name": job name
- "hadoop.job.history.user.location" / "mapred.output.dir": for job history log file location.
- "fs.default.name" / "fs.*.impl" / "fs.automatic.close": file system related stuff, also
for placing the job history log to the right place as specified by user.
- "user.name": user name
- various memory related knobs.
- "mapred.map.tasks" / "mapred.reduce.tasks": user desired # of map/reduce tasks

As we can see, (1) the list of properties needed by JT is not much, and it'd be better if
we not load the complete JobConf object for each job. (2) this is a pretty diverged list of
properties. Maintaining such a list in synchrony with JobTracker code is a hard problem.

> Protect Job Tracker against memory exhaustion due to very large InputSplit or JobConf
> ---------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-841
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.20.1
>            Reporter: Hong Tang
>             Fix For: 0.21.0
> JobTracker only needs to examine a subset of information contained by InputSplit or JobConf
objects. But currently JobTracker loads the complete user-defined InputSplit and JobConf objects
in memory. This design would leave JobTracker susceptible to memory exhaustion particularly
in cases when some bugs in user code which could result in very large input splits or job
conf objects (e.g. PIG-901).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message