crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: Running crunch on remote jobtracker
Date Fri, 21 Feb 2014 07:39:25 GMT
Hi Chao

Yes, that was exactly what I was missing.

1) Set hadoop configuration property mapred.job.tracker to the remote
address on port 8021.
2) Use the DistributedCache to upload jar dependencies like so
DistributedCache.addFileToClassPath(new
Path("/tmp/crunch-core-0.8.0-cdh4.3.0.jar"), hadoopConf);

Thanks,
-Kristoffer



On Fri, Feb 21, 2014 at 4:23 AM, Chao Shi <stepinto@live.com> wrote:

> Hi Kristoffer,
>
> As far as I can tell, you have to package classes into jar before
> submitting a job.
>
> "hadoop jar" is the simplest approach to submit jobs. There are other
> approaches though. MR uses mapred.job.tracker to determine whether to run
> job remotely or locally. "hadoop jar" command will set it to the configured
> job tracker address automatically, so the job is submitted to a remote
> cluster.
>
>
> 2014-02-20 5:01 GMT+08:00 Kristoffer Sjögren <stoffe@gmail.com>:
>
> Hi
>>
>> Im running the crunch wordcount example using ToolRunner.run (from
>> intellij) and data is read from hdfs but the actual job is running locally
>> instead of on the remote cluster.
>>
>> Do I need to use hadoop jar command using a pre packaged jar? Or is there
>> any way to  kick off a remote job?
>>
>> Cheers,
>> -Kristoffer
>>
>
>

Mime
View raw message