hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panshul Whisper <ouchwhis...@gmail.com>
Subject Re: Submitting MapReduce job from remote server using JobClient
Date Sun, 27 Jan 2013 11:53:34 GMT
Hello Amit,

I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:

http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/

hope this helps,
Regards,



On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <amits@infolinks.com> wrote:

> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <bejoy.hadoop@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <amits@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<user@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>>  *org.apache.hadoop.mapred.JobClient                           - Running
>> job: job_local_0001*
>> *INFO  org.apache.hadoop.util.ProcessTree                           -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO  org.apache.hadoop.mapred.MapTask                             -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <amits@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Regards,
Ouch Whisper
010101010101

Mime
View raw message