hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei (Eddy) Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7541) Upgrade Domains in HDFS
Date Sat, 18 Jul 2015 00:18:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632159#comment-14632159

Lei (Eddy) Xu commented on HDFS-7541:

Hi, [~mingma]

Thanks a lot for working on this very useful feature. I had some similar thoughts for a while.

The design is very reasonable. It preserves the perf/availability characteristics of the current
block placement policy while accurately controls the availability of replicas in the event
of shutting down multiple DNs.

A few small questions:

* How about call it {{Availability Domain}}, which is similar to AWS's availability zones.
I think that this concept can be used in a broader way.
* Is this {{upgrade domain}} on each DN a soft state or a hard state? Would choosing soft/hard
state have some implications for admins? For example, can admins re-purpose DN machines and
move them around?
* What do you anticipate as a good strategy to choose upgrade domains {{UDs}}? For instance,
supposing we have 40 machines / rack, and 100 racks. Should we choose 40 different {{UDs}}
and each rack has one of it, or 10 {{UDs}} and each rack has 4 of them? Or 50 racks have 20
{{UDs}} and the rest racks have the other {{UDs}}. What are the pros/cons between having 3-5
UDs vs 40 UDs?
* Regarding the performance impact, would you share us about the approximated scale of # of
racks, # of different {{UD}} and # of concurrent writes?
* In {{design v2.pdf}}, could you mind to rephrase the process of "Replica delete operation"?
It is a little bit difficult to understand.
* The last one maybe not relevant: would this design work well with erasure coding (HDFS-7285)?

Looking forward to hear more.

> Upgrade Domains in HDFS
> -----------------------
>                 Key: HDFS-7541
>                 URL: https://issues.apache.org/jira/browse/HDFS-7541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>         Attachments: HDFS-7541-2.patch, HDFS-7541.patch, SupportforfastHDFSdatanoderollingupgrade.pdf,
> Current HDFS DN rolling upgrade step requires sequential DN restart to minimize the impact
on data availability and read/write operations. The side effect is longer upgrade duration
for large clusters. This might be acceptable for DN JVM quick restart to update hadoop code/configuration.
However, for OS upgrade that requires machine reboot, the overall upgrade duration will be
too long if we continue to do sequential DN rolling restart.

This message was sent by Atlassian JIRA

View raw message