hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2622) Fix -file option in Streaming to use Distributed Cache
Date Tue, 11 Mar 2008 09:10:46 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Amareshwari Sriramadasu updated HADOOP-2622:

    Attachment: patch-2622.txt

bq. if users have their own inputFormat, they would have to jar it with streaming jar and
use the custom jar because setInputFormat is done at client side. So, passing that via -file
does not work. It would be really helpful if this is also address.

If we add Inputformat class (hierarchy also if any) using -file will work with current code,
if we add the jar to the classpath. 
For example, If you have a.b.c.MyInputFormat as the inputformat, and dir hierarchy is dir/a/b/c/MyInputFormat.class
then inputformat can be added to the jar using following command:
bin/hadoop jar build/contrib/streaming/hadoop-0.17.0-dev-streaming.jar -mapper my.pl -input
t.txt -output output -file my.pl -file dir/ -inputformat a.b.c.MyInputFormat

Here is patch which will add the jar file to the classpath.
I tested this to add an inputformat, and this worked fine.
Lohit, Can you apply this patch and check if use of -file works for adding inputformat ?

> Fix -file option in Streaming to use Distributed Cache
> ------------------------------------------------------
>                 Key: HADOOP-2622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2622
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.17.0
>         Attachments: patch-2622.txt
> The -file option works by putting the script into the job's jar file by unjar-ing, copying
and then jar-ing it again.
> We should rework the -file option to use the DistributedCache and the symlink option
it provides.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message