hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
Date Thu, 10 Sep 2009 06:30:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753455#action_12753455

Todd Lipcon commented on MAPREDUCE-967:

Currently, TaskTracker.localizeJob completely unjars job.jar in jobCacheDir. TaskRunner then
appends <jobCacheDir>/classes:<jobCacheDir>/lib/*:<jobCacheDir>/ to the
task classpath. Instead, I propose that we only unpack the classes/ and lib/ portions of job.jar,
and add <jobCacheDir>/job.jar to the task classpath in lieu of <jobCacheDir>/

While we're at it, I'm not sure I see the purpose of the "classes/" directory - this is not
standard Jar layout by any means, and seems unnecessary. But that issue is orthogonal to this

Attaching a preliminary patch against branch-20, though this should go into trunk and probably
not the branch. I just want to test this on a real workload first.

> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>                 Key: MAPREDUCE-967
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
> In practice we have seen some users submitting job jars that consist of 10,000+ classes.
Unpacking these jars into mapred.local.dir and then cleaning up after them has a significant
cost (both in wall clock and in unnecessary heavy disk utilization). This cost can be easily

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message