hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4016) TaskTrackers never (re)connect back to the JobTracker if the JobTracker node/machine is changed
Date Mon, 10 Nov 2008 12:12:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646237#action_12646237
] 

Steve Loughran commented on HADOOP-4016:
----------------------------------------

Assuming Amar changed the DNS entry for the job tracker, then it won't be enough

- the JVM caches hostnames forever unless you tell it otherwise

"Otherwise" means setting the JVM security properties
networkaddress.cache.ttl and  networkaddress.cache.ttl 

http://java.sun.com/javase/6/docs/api/java/net/InetAddress.html

- That's regardless of any caching done in process. The task tracker reads in "mapred.job.tracker"
from the configuration on startup only.

To do failover of job tracker you'd need to change the JVM to not cache the addresses forever
-which will have other consequences, good and bad, and then change TaskTracker to try and
redo the nslookup when the job tracker heartbeat's failed. 

This will be a fun test to automate. You could do it in-VM by starting a second job tracker
on a different port of localhost and then stop the original tracker, check that the tasktracker
failed its hearbeat, reread the config and picked up the new (host,port) setting. This would
not test DNS caching, but would show the tasktracker was rereading its configuration. DNS
Caching tests are hard outside of a VMWare/Xen cluster. 



> TaskTrackers never (re)connect back to the JobTracker if the JobTracker node/machine
is changed
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4016
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4016
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>
> I tried the following 
> 1) Started a hadoop cluster.
> 2) Killed the JT
> 3) Selected a new node for starting JT. 
> 4) Changed the entry on the tasktracker to reflect the new (old) hostname to (new) ip
mapping. Checked if the tracker node correctly resolves the hostname to the new ip.
> 5) Start the JT on the new node
> The tasktracker fails to connect to the new jobtracker. It seems that the hostname resolution
remains stale and is never updated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message