hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mori Bellamy <mbell...@apple.com>
Subject Re: How to chain multiple hadoop jobs?
Date Mon, 14 Jul 2008 22:56:32 GMT
Weird. I use eclipse, but that's never happened to me. When  you set  
up your JobConfs, for example:
JobConf conf2 = new JobConf(getConf(),MyClass.class)
is your "MyClass" in the same package as your driver program? also, do  
you run from eclipse or from the command line (i've never tried to  
launch a hadoop task from eclipse). if you run from the command line:

hadoop jar MyMRTaskWrapper.jar myEntryClass option1 option2...

and all of the requisite resources are in MyMRTaskWrapper.jar, i don't  
see what the problem would be. if this is the way you run a hadoop  
task, are you sure that all of the resources are getting compiled into  
the same jar? when you export a jar from eclipse, it won't pack up  
external resources by default. (look into addons like FatJAR for that).


On Jul 14, 2008, at 2:25 PM, Sean Arietta wrote:

>
> Well that's what I need to do also... but Hadoop complains to me  
> when I
> attempt to do that. Are you using Eclipse by any chance to develop?  
> The
> error I'm getting seems to be stemming from the fact that Hadoop  
> thinks I am
> uploading a new jar for EVERY execution of JobClient.runJob() so it  
> fails
> indicating the job jar file doesn't exist. Did you have to turn  
> something
> on/off to get it to ignore that or are you using a different IDE?  
> Thanks!
>
> Cheers,
> Sean
>
>
> Mori Bellamy wrote:
>>
>> hey sean,
>>
>> i later learned that the method i originally posted (configuring
>> different JobConfs and then running them, blocking style, with
>> JobClient.runJob(conf)) was sufficient for my needs. the reason it  
>> was
>> failing before was somehow my fault and the bugs somehow got fixed  
>> x_X.
>>
>> Lukas gave me a helpful reply pointing me to TestJobControl.java (in
>> the hadoop source directory). it seems like this would be helpful if
>> your job dependencies are complex. but for me, i just need to do one
>> job after another (and every job only depends on the one right before
>> it), so the code i originally posted works fine.
>> On Jul 14, 2008, at 1:38 PM, Sean Arietta wrote:
>>
>>>
>>> Could you please provide some small code snippets elaborating on how
>>> you
>>> implemented that? I have a similar need as the author of this thread
>>> and I
>>> would appreciate any help. Thanks!
>>>
>>> Cheers,
>>> Sean
>>>
>>>
>>> Joman Chu-2 wrote:
>>>>
>>>> Hi, I use Toolrunner.run() for multiple MapReduce jobs. It seems to
>>>> work
>>>> well. I've run sequences involving hundreds of MapReduce jobs in a
>>>> for
>>>> loop and it hasn't died on me yet.
>>>>
>>>> On Wed, July 9, 2008 4:28 pm, Mori Bellamy said:
>>>>> Hey all, I'm trying to chain multiple mapreduce jobs together to
>>>>> accomplish a complex task. I believe that the way to do it is as
>>>>> follows:
>>>>>
>>>>> JobConf conf = new JobConf(getConf(), MyClass.class); //configure
>>>>> job....
>>>>> set mappers, reducers, etc
>>>>> SequenceFileOutputFormat.setOutputPath(conf,myPath1);
>>>>> JobClient.runJob(conf);
>>>>>
>>>>> //new job JobConf conf2 = new JobConf(getConf(),MyClass.class)
>>>>> SequenceFileInputFormat.setInputPath(conf,myPath1); //more
>>>>> configuration... JobClient.runJob(conf2)
>>>>>
>>>>> Is this the canonical way to chain jobs? I'm having some trouble
>>>>> with
>>>>> this
>>>>> method -- for especially long jobs, the latter MR tasks sometimes
>>>>> do not
>>>>> start up.
>>>>>
>>>>>
>>>>
>>>>
>>>> -- 
>>>> Joman Chu
>>>> AIM: ARcanUSNUMquam
>>>> IRC: irc.liquid-silver.net
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18452309.html
>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18453200.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message