hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4016) TaskTrackers never (re)connect back to the JobTracker if the JobTracker node/machine is changed
Date Tue, 11 Nov 2008 11:36:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646529#action_12646529
] 

Steve Loughran commented on HADOOP-4016:
----------------------------------------

Dhruba, I think you've got same-IP and same hostname mixed up. Same IP is easy, only the routers
care

Same hostname, different IP means you have to wait for DNS entries to propagate around. Some
places it works, some places it doesnt. 

Changing the configuration works on top of changing the hostname, as it doesn't prevent you
doing that, but you can try other things like bringing up a new JT on a different port on
the same box, new machine, etc. But you do need a way to push out configuration changes to
all the task trackers, which classic XML-driven configurations don't have. I can do it, but
my code to do that is from subclassing JobConf and using SmartFrog. HADOOP-3582 is a todo
list of better configuration options; you'd need a way to push out the change.

1. the extra cost of another lookup of the configuration file is minimal if you are doing
retries anyway.
2. we should set the dns TTL anyway too, but check with Allen what settings he likes before
bringing the network down. 
3. configuration file changes are easier to test, so we can see that the file gets checked.

> TaskTrackers never (re)connect back to the JobTracker if the JobTracker node/machine
is changed
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4016
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4016
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>
> I tried the following 
> 1) Started a hadoop cluster.
> 2) Killed the JT
> 3) Selected a new node for starting JT. 
> 4) Changed the entry on the tasktracker to reflect the new (old) hostname to (new) ip
mapping. Checked if the tracker node correctly resolves the hostname to the new ip.
> 5) Start the JT on the new node
> The tasktracker fails to connect to the new jobtracker. It seems that the hostname resolution
remains stale and is never updated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message