hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sriramadasu <amar...@yahoo-inc.com>
Subject Re: streaming question
Date Tue, 16 Sep 2008 09:14:11 GMT
Looks like you have to wait for HADOOP-3570 and use -libjars for the same.

Christian Ulrik Søttrup wrote:
> Ok i've tried what you suggested and all sorts of combinations with no 
> luck.
> Then I went through the source of the Streaming lib. It looks like it 
> checks for the existence
> of the combiner while it is building the jobconf i.e. before the job 
> is sent to the nodes.
> It calls class.forName() on the combiner in goodClassOrNull() from 
> StreamUtil.java
> called from setJobconf() in StreamJob.java.
> Anybody have an idea how i can use a custom combiner? would I have to 
> package it into the streaming jar?
> cheers,
> Christian
> Dennis Kubes wrote:
>> If testlink is a package, it should be:
>> hadoop -jar streaming/hadoop-0.17.0-streaming.jar -input store 
>> -output cout -mapper MyProg -combiner testlink.combiner -reducer 
>> testlink.reduce -file /home/hadoop/MyProg -cacheFile 
>> /shared/part-00000#in.cl -cacheArchive /related/MyJar.jar#testlink
>> if not a package, remove the testlink part.
>> Dennis
>> Christian Ulrik Søttrup wrote:
>>> Ok, so I added the JAR to the cacheArchive option and my command 
>>> looks like this:
>>> hadoop jar streaming/hadoop-0.17.0-streaming.jar  -input /store/ 
>>> -output /cout/ -mapper MyProg -combiner testlink/combiner.class 
>>> -reducer testlink/reduce.class -file /home/hadoop/MyProg -cacheFile 
>>> /shared/part-00000#in.cl -cacheArchive /related/MyJar.jar#testlink
>>> Now it fails because it cannot find the combiner.  The cacheArchive 
>>> option creates a symlink in the local running directory, correct? 
>>> Just like the cacheFile option? If not how can i then specify which 
>>> class to use?
>>> cheers,
>>> Christian
>>> Amareshwari Sriramadasu wrote:
>>>> Dennis Kubes wrote:
>>>>> If I understand what you are asking you can use the -cacheArchive 
>>>>> with the path to the jar to including the jar file in the 
>>>>> classpath of your streaming job.
>>>>> Dennis
>>>> You can also use -cacheArchive option to include jar file and 
>>>> symlink the unjarred directory from cwd by providing the uri as 
>>>> hdfs://<path>#link. You have to provide -reducer and -combiner 
>>>> options as appropriate paths in the unjarred directory.
>>>> Thanks
>>>> Amareshwari
>>>>> Christian Søttrup wrote:
>>>>>> Hi all,
>>>>>> I have an application that i use to run with the "hadoop jar" 
>>>>>> command.
>>>>>> I have now written an optimized version of the mapper in C.
>>>>>> I have run this using the streaming library and everything looks

>>>>>> ok (using num.reducers=0).
>>>>>> Now i want to use this mapper together with the combiner and 
>>>>>> reducer from my old .jar file.
>>>>>> How do i do this? How can i distribute the jar and run the 
>>>>>> reducer and combiner from it?
>>>>>> While also running the c program as the mapper in streaming mode.
>>>>>> cheers,
>>>>>> Christian

View raw message