hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei (Eddy) Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7541) Upgrade Domains in HDFS
Date Sat, 18 Jul 2015 00:18:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632159#comment-14632159
] 

Lei (Eddy) Xu commented on HDFS-7541:
-------------------------------------

Hi, [~mingma]

Thanks a lot for working on this very useful feature. I had some similar thoughts for a while.


The design is very reasonable. It preserves the perf/availability characteristics of the current
block placement policy while accurately controls the availability of replicas in the event
of shutting down multiple DNs.

A few small questions:

* How about call it {{Availability Domain}}, which is similar to AWS's availability zones.
I think that this concept can be used in a broader way.
* Is this {{upgrade domain}} on each DN a soft state or a hard state? Would choosing soft/hard
state have some implications for admins? For example, can admins re-purpose DN machines and
move them around?
* What do you anticipate as a good strategy to choose upgrade domains {{UDs}}? For instance,
supposing we have 40 machines / rack, and 100 racks. Should we choose 40 different {{UDs}}
and each rack has one of it, or 10 {{UDs}} and each rack has 4 of them? Or 50 racks have 20
{{UDs}} and the rest racks have the other {{UDs}}. What are the pros/cons between having 3-5
UDs vs 40 UDs?
* Regarding the performance impact, would you share us about the approximated scale of # of
racks, # of different {{UD}} and # of concurrent writes?
* In {{design v2.pdf}}, could you mind to rephrase the process of "Replica delete operation"?
It is a little bit difficult to understand.
* The last one maybe not relevant: would this design work well with erasure coding (HDFS-7285)?

Looking forward to hear more.

> Upgrade Domains in HDFS
> -----------------------
>
>                 Key: HDFS-7541
>                 URL: https://issues.apache.org/jira/browse/HDFS-7541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>         Attachments: HDFS-7541-2.patch, HDFS-7541.patch, SupportforfastHDFSdatanoderollingupgrade.pdf,
UpgradeDomains_design_v2.pdf
>
>
> Current HDFS DN rolling upgrade step requires sequential DN restart to minimize the impact
on data availability and read/write operations. The side effect is longer upgrade duration
for large clusters. This might be acceptable for DN JVM quick restart to update hadoop code/configuration.
However, for OS upgrade that requires machine reboot, the overall upgrade duration will be
too long if we continue to do sequential DN rolling restart.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message