hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod KV <vino...@yahoo-inc.com>
Subject Re: Problems with HOD and HDFS
Date Tue, 15 Jun 2010 08:10:02 GMT
On Tuesday 15 June 2010 04:19 AM, David Milne wrote:
> [2010-06-15 10:07:52,470] DEBUG/10 torque:147 - pbsdsh command:
> /opt/torque-2.4.5/bin/pbsdsh
> /home/dmilne/hadoop/hadoop-0.20.1/contrib/hod/bin/hodring
> --hodring.tarball-retry-initial-time 1.0
> --hodring.cmd-retry-initial-time 2.0 --hodring.cmd-retry-interval 2.0
> --hodring.service-id 34350.symphony.cs.waikato.ac.nz
> --hodring.temp-dir /scratch/local/dmilne/hod --hodring.http-port-range
> 8000-9000 --hodring.userid dmilne --hodring.java-home /opt/jdk1.6.0_20
> --hodring.svcrgy-addr symphony.cs.waikato.ac.nz:36372
> --hodring.download-addr h:t --hodring.tarball-retry-interval 3.0
> --hodring.log-dir /scratch/local/dmilne/hod/log
> --hodring.mapred-system-dir-root /mapredsystem
> --hodring.xrs-port-range 32768-65536 --hodring.debug 4
> --hodring.ringmaster-xrs-addr cn71:33771 --hodring.register
> [2010-06-15 10:07:52,475] DEBUG/10 ringMaster:929 - Returned from runWorkers.
>
> //chorus (many times)
>    

Did you mean pbsdsh command itseld was printed many times above? That 
should not happen.

I previously thought hodrings could not start namenode but looks like 
hodrings themselves failed to startup. You can do two things:
  - See qstat output, log into the slave nodes where your job was 
supposed to start and see hodring logs there.
  - run the above hodring command yourselves directly on on these slave 
nodes for your job and see if it fails with some error.

+Vinod

Mime
View raw message