Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <5346049.1193767610968.JavaMail.jira@brutus>
Date: Tue, 30 Oct 2007 11:06:50 -0700 (PDT)
From: "Allen Wittenauer (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-1985) Abstract node to switch mapping
 into a topology service class used by namenode and jobtracker
In-Reply-To: <29588904.1191350390926.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538872 ] 

Allen Wittenauer commented on HADOOP-1985:
------------------------------------------

Just a few notes.

>From an ops perspective, it is important that this mapping be highly pluggable in an easy way.  The ability to have hadoop call some sort of executable (not necessarily a script) means we can do fancy things with /etc/netmasks or LDAP lookups or ... .  Ideally, every sort of mapping would have a callout rather than having one big one. KISS is important here.  [Remember, most admins--myself included--are not hardcore Java people. ]

FWIW, most implementations of autofs include similar functionality called executable maps where the key is passed to an exec and the exec returns the location of the mount.  So the practice has at least a little bit of traction.  [In fact, auto.net aka /net on Linux uses this method.]

Additionally,I think moving this functionality to be done on the namenode makes this significantly easier to manage as a grid scales up.  There is also the issue of should the namenode 'trust' the datanode to report the proper location.  I understand that the datanode and namenode have to trust each other at some point during node bringup, but I think it makes a lot of sense to let the namenode be in charge of data locality.

Hopefuly this was helpful.

> Abstract node to switch mapping into a topology service class used by namenode and jobtracker
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1985
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1985
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: eric baldeschwieler
>            Assignee: Devaraj Das
>
> In order to implement switch locality in MapReduce, we need to have switch location in both the namenode and job tracker.  Currently the namenode asks the data nodes for this info and they run a local script to answer this question.  In our environment and others that I know of there is no reason to push this to each node.  It is easier to maintain a centralized script that maps node DNS names to switch strings.
> I propose that we build a new class that caches known DNS name to switch mappings and invokes a loadable class or a configurable system call to resolve unknown DNS to switch mappings.  We can then add this to the namenode to support the current block to switch mapping needs and simplify the data nodes.  We can also add this same callout to the job tracker and then implement rack locality logic there without needing to chane the filesystem API or the split planning API.
> Not only is this the least intrusive path to building racklocal MR I can ID, it is also future compatible to future infrastructures that may derive topology on the fly, etc, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.