hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PeterAtReunion <pet...@mylife.com>
Subject Re: Tasktracker appearing from "nowhere" - [SOLVED]
Date Sat, 29 May 2010 02:21:25 GMT
The probelm was some errant tasktrackers still running on hosts I thought were down.
When I stopped *all* tasktrackers a fresh restart seemed to run cleanly.

Thanks again to Hemanth for giveing me the clue that the slave file was advisory only.


On 05/28/10 10:54, PeterAtReunion wrote:
> Hemanth -
> Thanks for the insight on the use of slave file.
> In my case there is no Hadoop running on the machine m351. IN fact no java based programs
running on it at all.
> The machine was in the cluster (mistakenly in the slave file durring a start-all.sh invocation)
for a short time,
> but since then has been completely purged from everywhere except the racks.txt file.
> When it was *not* in the racks.txt file no mapreduce jobs will start. Instead get endless
error loops of:
> java.io.IOException: java.lang.NullPointerException
> 2010-05-27 00:00:00,339 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'tracker_m351.ra.wink.com:localhost/'
> 2010-05-27 00:00:00,413 WARN org.apache.hadoop.net.ScriptBasedMapping: Script /usr/local/bin/wk_rack.sh
returned 0 values when 1 were expected.
> 2010-05-27 00:00:00,413 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54311,
call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@6991c610, true, true,
> true, -1) from error: java.io.IOEx
> This is our own ScriptBasedMapping for generating the hostDNS => rack NetworkTopology
name mapping.
> This script is called with the m351 host name but I can't figure out why or where from.
> Any insights on who remembers topology between shutdown/restarts?
> (consisting of bin/stop-all.sh and a confirmation that all java programs are stopped
> on all hosts on our network, followed by bin/start-all.sh on the master NameNode that
seems to just walk the slaves file.)
> ;;peter
> On 05/28/10 02:51, Hemanth Yamijala wrote:
>> Peter,
>>> I'm getting the following errors:
>>> WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record
of 'previous' heartbeat for 'tracker_m351.ra.wink.com:localhost/';
>>> reinitializing the tasktracker
>>> INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201005271529_0004_r_000042_1'
to tip task_201005271529_0004_r_000042, for tracker
>>> 'tracker_m351.ra.wink.com:localhost/'
>>> INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201005271529_0004_m_000112_0'
from 'tracker_m351.ra.wink.com:localhost/'
>>> despite not having m351 in any of the config files except racks.txt.
>>> If I take it out of there I can't start any jobs at all.
>>> Question is - what would make a machine be contacted as a tasktracker when it
is not in the slave or *.xml files?
>> If m351 has Hadoop and a mapred-site.xml or hadoop-site.xml pointing
>> to the right JobTracker, it would register itself as a TaskTracker
>> when Hadoop starts on it. The slave file is used primarily to start
>> the daemons from a central place and is not a way to specify which
>> nodes must join the Hadoop cluster.
>> Thanks
>> hemanth

View raw message