hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
Date Mon, 02 Nov 2009 11:56:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772500#action_12772500

Vinod K V commented on MAPREDUCE-967:

bq. One note about this JIRA - it will need some fix for Streaming as well. The common way
that people ship scripts for streaming is using the "-file foo.py" argument. This just includes
foo.py in the job jar and assumes it will be unpacked on the other side. With this patch,
it won't unpack those and breaks the -file argument's primary use case.

I've just looked up the documentation, and, though not very explicit, {{-file}} is part of
the job.jar (and hence for small files) whereas {{-files, -archives}} can be used for large
files. So, going by that, I am +1 for the 2nd approach that you've outlined. If we want to
be sure, we can make the above distinction explicit in the forrest docs.

Will quickly look at your patch.

> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>                 Key: MAPREDUCE-967
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, mapreduce-967.txt
> In practice we have seen some users submitting job jars that consist of 10,000+ classes.
Unpacking these jars into mapred.local.dir and then cleaning up after them has a significant
cost (both in wall clock and in unnecessary heavy disk utilization). This cost can be easily

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message