hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-6261) Document for enabling node group layer in HDFS
Date Thu, 28 May 2015 00:51:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562108#comment-14562108
] 

Allen Wittenauer edited comment on HDFS-6261 at 5/28/15 12:51 AM:
------------------------------------------------------------------

Awesome!  Doc patches are great!  

Now for the review. ;) This isn't comprehensive, but here's a first pass at least.

A common problem is a missing 'the' in front of 'following'.  I pointed it out in a few places,
but there are more. Articles in English are tricky. :(  'Following' is particularly tricky
though because 'following' without 'the' or 'a' in front of it is a verb (e.g., "The dog was
following the boy" vs. "Hadoop has a following" or "See the following list of cool places
to eat".). But hey,  at least we don't have genders like the European languages though! :)


{code}
However, for
+other cases, like: Hadoop nodes running on virtualized platform, we have
+additional "hypervisor" layer, and its characteristics include:
{code}

I don't know how to parse this phrasing.  It feels awkward.  I'd probably rewrite as:

However for some cases, this is insufficient.  Take for example Hadoop nodes running on a
virtualized platform where there is an additional hypervisor layer.  It has the following
characteristics:

{code}
+-   The communication price between VMs within the same hypervisor is lower
+than across hypervisor (physical host) which will have higher throughput,
+lower latency, and not generating physical network traffic.
{code}

Same sort of problem.  I'd probably rephrase a bit:

"The communication price between multiple VMs running on one physical host is lower than the
communication price between processes on multiple physical hosts.  In addition to the multiple
VMs having higher throughput and lower latency between themselves, they do not generate any
network traffic on the wire."

{code}
transparent for Hadoop, so
{code}

'for' should be 'to'.  Hadoop (period). (new sentence) So

{code}
like following:
{code}

like the following:

{code}
layer, following polices
+in hdfs are refined:
{code}

the following.  HDFS.

{code}
+-   Replica placement policy
{code}

I have a feeling bullet points in front of all the items listed under this section may render
better.  I need to play with it though. 

{code}
of writer,
{code}

of the writer

{code}
on other
+    node
{code}

on another node

{code}
if node of writer
{code}

if the node of the writer

{code}
The remaining replicas are placed randomly across rack and node group to
+    meet minimum restriction.
{code}

I'm confused by this since there are missing articles and/or plurals here.  Does this mean
randomly across the remaining racks or randomly across all racks including the writer's rack?


{code}
At node level
{code}

At the node level

{code}
At block level
{code}

At the block level

{code}
Reliability: By never placing more than one replicas on the same node
+group(physical host), in case of node group failure, only one replica is
+lost at maximum.
{code}

Awkward phrasing.  I'd probably rewrite as:

"Reliability: By never placing more than one replica in the same node
group (aka physical host),  only one replica is lost at maximum in case of node group failure."

{code}
rather than remote node
{code}
than a remote

{code}
+3-layer topology tends to support different failure and locality topologies
+which is primarily driven from the perspective of virtualization, however,
+it is also possible to use the feature support other scenarios, such as
+those relating to failures of power supplies, arbitrary sets of physical
+servers, or collections of servers from same hardware purchase cycle.
{code}

This paragraph feels like it should be up closer to the top of these changes. 



was (Author: aw):
Awesome!  Doc patches are great!  

Now for the review. ;) This isn't comprehensive, but here's a first pass at least.

A common problem is a missing 'the' in front of 'following'.  I pointed it out in a few places,
but there are more. Articles in English are tricky. :(  'Following' is particularly tricky
though because 'following' without 'the' or 'a' in front of it is a verb (e.g., "The dog was
following the boy" vs. "Hadoop has a following" or "See the following list of cool places
to eat".). But hey,  at least we don't have genders like the European languages though! :)


{code}
However, for
+other cases, like: Hadoop nodes running on virtualized platform, we have
+additional "hypervisor" layer, and its characteristics include:
{code}

I don't know how to parse this phrasing.  It feels awkward.  I'd probably rewrite as:

However for some cases, this is insufficient.  Take for example Hadoop nodes running on a
virtualized platform where there is an additional hypervisor layer.  It has the following
characteristics:

{code}
+-   The communication price between VMs within the same hypervisor is lower
+than across hypervisor (physical host) which will have higher throughput,
+lower latency, and not generating physical network traffic.
{code}

Same sort of problem.  I'd probably rephrase a bit:

"The communication price between multiple VMs running on one physical host is lower than the
communication price between processes on multiple physical hosts.  In addition to the multiple
VMs having higher throughput and lower latency between themselves, they do not generate any
network traffic on the wire."

{code}
transparent for Hadoop, so
{code}

'for' should be 'to'.  Hadoop (period). (new sentence) So

{code}
like following:
{code}

like the following:

{code}
layer, following polices
+in hdfs are refined:
{code}

the following.  HDFS.

{code}
+-   Replica placement policy
{code}

I have a feeling bullet points in front of all the items listed under this section may render
better.  I need to play with it though. 

{code}
of writer,
{code

of the writer

{code}
on other
+    node
{code}

on another node

{code}
if node of writer
{code}

if the node of the writer

{code}
The remaining replicas are placed randomly across rack and node group to
+    meet minimum restriction.
{code}

I'm confused by this since there are missing articles and/or plurals here.  Does this mean
randomly across the remaining racks or randomly across all racks including the writer's rack?


{code}
At node level
{code}

At the node level

{code}
At block level
{code}

At the block level

{code}
Reliability: By never placing more than one replicas on the same node
+group(physical host), in case of node group failure, only one replica is
+lost at maximum.
{code}

Awkward phrasing.  I'd probably rewrite as:

"Reliability: By never placing more than one replica in the same node
group (aka physical host),  only one replica is lost at maximum in case of node group failure."

{code}
rather than remote node
{code}
than a remote

{code}
+3-layer topology tends to support different failure and locality topologies
+which is primarily driven from the perspective of virtualization, however,
+it is also possible to use the feature support other scenarios, such as
+those relating to failures of power supplies, arbitrary sets of physical
+servers, or collections of servers from same hardware purchase cycle.
{code}

This paragraph feels like it should be up closer to the top of these changes. 


> Document for enabling node group layer in HDFS
> ----------------------------------------------
>
>                 Key: HDFS-6261
>                 URL: https://issues.apache.org/jira/browse/HDFS-6261
>             Project: Hadoop HDFS
>          Issue Type: Task
>          Components: documentation
>            Reporter: Wenwu Peng
>            Assignee: Binglin Chang
>              Labels: documentation
>         Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png,
4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch,
HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch
>
>
> Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there is no site
to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration.
so we need to doc it.
> 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
> 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message