hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2622) Fix -file option in Streaming to use Distributed Cache
Date Tue, 11 Mar 2008 11:14:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577391#action_12577391
] 

Amareshwari Sriramadasu commented on HADOOP-2622:
-------------------------------------------------

For addressing the issue itself, i.e. fixing -file to use distributed cache , we can do the
following:

1. Leave streaming jar as job.jar
2. Create a jar file from the files/dir given using -file oprion
3. Copy the jar file, created in step 2, to the dfs at a job specific location. 
    say submitJobDir/_jobFiles (${mapred.system.dir}/jobid/_jobFiles)
4. add the jar file to the distributed cache using addArchiveToClassPath

Thoughts?

> Fix -file option in Streaming to use Distributed Cache
> ------------------------------------------------------
>
>                 Key: HADOOP-2622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2622
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.17.0
>
>         Attachments: patch-2622.txt
>
>
> The -file option works by putting the script into the job's jar file by unjar-ing, copying
and then jar-ing it again.
> We should rework the -file option to use the DistributedCache and the symlink option
it provides.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message