hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PeterAtReunion <pet...@mylife.com>
Subject Re: Tasktracker appearing from "nowhere"
Date Fri, 28 May 2010 17:54:00 GMT
Hemanth -

Thanks for the insight on the use of slave file.

In my case there is no Hadoop running on the machine m351. IN fact no java based programs
running on it at all.
The machine was in the cluster (mistakenly in the slave file durring a start-all.sh invocation)
for a short time,
but since then has been completely purged from everywhere except the racks.txt file.

When it was *not* in the racks.txt file no mapreduce jobs will start. Instead get endless
error loops of:

java.io.IOException: java.lang.NullPointerException
2010-05-27 00:00:00,339 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'tracker_m351.ra.wink.com:localhost/'
2010-05-27 00:00:00,413 WARN org.apache.hadoop.net.ScriptBasedMapping: Script /usr/local/bin/wk_rack.sh
returned 0 values when 1 were expected.
2010-05-27 00:00:00,413 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54311,
call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@6991c610, true, true,
true, -1) from error: java.io.IOEx

This is our own ScriptBasedMapping for generating the hostDNS => rack NetworkTopology name
This script is called with the m351 host name but I can't figure out why or where from.

Any insights on who remembers topology between shutdown/restarts?
(consisting of bin/stop-all.sh and a confirmation that all java programs are stopped
on all hosts on our network, followed by bin/start-all.sh on the master NameNode that seems
to just walk the slaves file.)


On 05/28/10 02:51, Hemanth Yamijala wrote:
> Peter,
>> I'm getting the following errors:
>> WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of
'previous' heartbeat for 'tracker_m351.ra.wink.com:localhost/';
>> reinitializing the tasktracker
>> INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201005271529_0004_r_000042_1'
to tip task_201005271529_0004_r_000042, for tracker
>> 'tracker_m351.ra.wink.com:localhost/'
>> INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201005271529_0004_m_000112_0'
from 'tracker_m351.ra.wink.com:localhost/'
>> despite not having m351 in any of the config files except racks.txt.
>> If I take it out of there I can't start any jobs at all.
>> Question is - what would make a machine be contacted as a tasktracker when it is
not in the slave or *.xml files?
> If m351 has Hadoop and a mapred-site.xml or hadoop-site.xml pointing
> to the right JobTracker, it would register itself as a TaskTracker
> when Hadoop starts on it. The slave file is used primarily to start
> the daemons from a central place and is not a way to specify which
> nodes must join the Hadoop cluster.
> Thanks
> hemanth

View raw message