hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9411) HDFS NodeLabel support
Date Thu, 23 Jun 2016 22:06:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347292#comment-15347292

Kai Zheng commented on HDFS-9411:

Thanks [~vinayrpet] so much for addressing my questions! A few more after reading the new

1. >>Change in Label expression set on directories will be inherited for new files.
Does not reflect on already existing files.
This sounds like storage policy? How about rename?

2. >>Current scope for label expression should be STRICT. i.e. If node doesn’t satisfy
the expression, will not be chosen. If no node satisfies write should fail
Is there any means to specify a label or label expression is STRICT or not (OPTIONAL)?

3. >>Labels can be removed only when there are no nodes associated with it. So to remove
a node, admin can reset/change labels on nodes first, then can remove the labels from NameNode.

A minor, I thought you may mean, "So to remove a label, admin can ..."

4. >>Label for each node should start with an alpha-numeric character...
This sounds good. Such label spec would be good to be in common side so HDS and YARN can share
it consistently.

5. >>NodeLabel->DataNode mapping will be done by DfsAdmin.
I'm not sure how it's done in YARN, maybe a property file in datanode letting admin list the
labels there? Some labels like arch, OS can be automatically detected or discovered while
datanode starting. I'm thinking about how to make labels easy to configure and use.

>From HDFS perspective this sounds pretty good, and my overall suggestion would be, define
and make the basic node label support in common side, in order to: 1) generic node label isn't
essentially specific to HDFS, though some labels are. 2) shared by both HDFS and YARN in future,
so admin may save some work, for example, using some common means admin can just specify all
the labels for a node in a time, for both YARN and HDFS. 3) consistent in logic and behavior.
Roughly, a job for a tenant should be scheduled to the datanodes where the input data reside
for locality. 4) broad discussion to involve YARN guys. I understand it's not easy to split,
but would be good to think about it. Thanks.

> HDFS NodeLabel support
> ----------------------
>                 Key: HDFS-9411
>                 URL: https://issues.apache.org/jira/browse/HDFS-9411
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>         Attachments: HDFSNodeLabels-20-06-2016.pdf, HDFS_ZoneLabels-16112015.pdf
> HDFS currently stores data blocks on different datanodes chosen by BlockPlacement Policy.
These datanodes are random within the scope(local-rack/different-rack/nodegroup) of network
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant can be on
any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy making
the other tenant's application to slow down. It would be better if admin's have a provision
to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select specific nodes
with specific requirements.
> High level design doc to follow soon.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message