hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2185) Server ports: to roll or not to roll.
Date Sat, 10 Nov 2007 20:42:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541586
] 

Tsz Wo (Nicholas), SZE commented on HADOOP-2185:
------------------------------------------------

Not sure the following is related to this issue:

The static method DataNode.createSocketAddr(String target) is used everywhere.  It is better
to move it to org.apache.hadoop.net.NetUtils.  

Similarly, StatusHttpServer is in the org.apache.hadoop.mapred package, which is a wrong place.

> Server ports: to roll or not to roll.
> -------------------------------------
>
>                 Key: HADOOP-2185
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2185
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf, dfs, mapred
>    Affects Versions: 0.15.0
>            Reporter: Konstantin Shvachko
>             Fix For: 0.16.0
>
>
> Looked at the issues related to port rolling. My impression is that port rolling is required
only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order to be able
to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, meaning
that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always uses the
ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and *.info.port
> except for the task tracker, which calls them *.http.bindAddress and *.http.port instead
of "info".
> With respect to the info servers I propose to completely eliminate the port parameters,
and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified port if
it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included into the
default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.info.bindAddress = host:port |
> | DataNode | dfs.datanode.port = port | same |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.info.bindAddress = host:port
|
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker| mapred.job.tracker = host:port | same |
> | | mapred.task.tracker.info.bindAddress = host | mapred.task.tracker.info.bindAddress
= host:port |
> | | mapred.task.tracker.info.port = port | eliminate |
> | TaskTracker| mapred.task.tracker.report.bindAddress = host | mapred.task.tracker.report.bindAddress
= host:port |
> | | tasktracker.http.bindAddress = host | mapred.task.tracker.info.bindAddress = host:port
|
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | dfs.secondary.info.bindAddress
= host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically using either
datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on this, especially
from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message