hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode
Date Fri, 08 Jan 2016 22:34:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090085#comment-15090085

Anu Engineer commented on HDFS-1312:

Hi [~andrew.wang],

Thanks for you comments.  Here are my thoughts on these issues.

bq. I don't follow this line of reasoning; don't concerns about using a new feature apply
to a hypothetical HDFS-1312 implementation too?

I think it is related to the risk. Let us look at the worst case scenarios possible with HDFS-1804
and HDFS-1312. With HDFS-1804 it is a cluster wide change, it is always on and any write will
always go thru it. HDFS-1804 thus can have a cluster wide impact including impact on various
workloads in the cluster.

However with HDFS-1312, the worst case is that we will take a node off-line. Since it is external
tool that operates off-line on a node. Another important difference is that it is not always
on, it works and goes away. So the amount of risk to the cluster, especially from an administrators
point of view is different with these 2 approaches.

bq. Why do we lose this? Can't the DN dump this somewhere?

we can , but then we need to add RPCs in datanode to pull out that data and display the change
in the node, whereas in the current approach it is something that we write to the local disk
and then compute the diff later against the sources. We don't need a datanode operation.

bq. This is an interesting point I was not aware of. Is the goal here to do inter-DN moving?

No, the goal is *intra-DN*, I was referring to {noformat} hdfs mover {noformat} not to {noformat}
hdfs balancer{noformat} 

bq. If it's only for intra-DN moving, then it could still live in the DN.

Completely agree, all block moving code will be in DN.  

bq. This is also why I brought up HDFS-8538. If HDFS-1804 is the default volume choosing policy,
we won't see imbalance outside of hotswap.

Agree, and it is a goal that we should work towards. From the comments in HDFS-8538, it looks
like we might have to make some minor tweaks to that before we can commit it. I can look at
it after HDFS-1312.

bq. The point I was trying to make is that HDFS-1804 addresses the imbalance issues besides
hotswap, so we eliminate the alerts in the first place. Hotswap is an operation explictly
undertaken by the admin, so the admin will know to also run the intra-DN balancer.

Since we both have made this point many times, I am going to agree with what you are saying.
Even if we assume that hotswap or normal swap is the only use case for disk balancing, in
a large cluster many disks would have failed. So if a cluster gets a number of disks replaced
the current interface would make admins life easier. The admins can replace a bunch of disks
on various machines and ask the system to find and fix those nodes. I just think the interface
we are building makes the life of admins easier, and takes nothing away from the use cases
described by you.

bq. This is an aspirational goal, but when debugging a prod cluster we almost certainly also
want to see the DN log too

Right now, we have actually met the aspirational goal, we capture the snapshot of the node
and that allows us to both debug and simulate what is happening with disk-balancer off-line.

bq. Would it help to have a phone call about this? We have a lot of points flying around,
might be easier to settle this via a higher-bandwidth medium.

I think that is an excellent idea, would love to chat with you in person. I will setup a meeting
and post the meeting info in this JIRA.

I really appreciate your inputs and thoughtful discussion we are having, hope to speak to
you in person soon.

> Re-balance disks within a Datanode
> ----------------------------------
>                 Key: HDFS-1312
>                 URL: https://issues.apache.org/jira/browse/HDFS-1312
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode
>            Reporter: Travis Crawford
>            Assignee: Anu Engineer
>         Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations where certain
disks are full while others are significantly less used. Users at many different sites have
experienced this issue, and HDFS administrators are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling disks at
the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy
environments this will still make use of all spindles, equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced
in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is not needed.

This message was sent by Atlassian JIRA

View raw message