hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
Date Wed, 11 Jul 2018 04:49:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539561#comment-16539561

Daniel Templeton commented on HDFS-13448:

I did an in-depth review of the latest patch.  My comments:

* In {{FSDirWriteFileOp}}, you did some cleanup: {code}    Set<Node> excludedNodesSet
        (excludedNodes == null) ? new HashSet<>()
            : new HashSet<>(Arrays.asList(excludedNodes));{code}I like that you made
the code consistent, but I have a personal vendetta against ternary operators.  I find they
hurt readability in most cases.  I don't expect you to do anything about my hang-ups, but
I can't not mention it.
* Your assert message ends up being a bit run-on when there's an error: {quote}Source node
was assigned a value though null was expected becuase it was flagged to ignore source node
localitly expected null, but was:<org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp$1@12028586{quote}
 Also, there are a couple spelling errors.

Otherwise, looks good.

> HDFS Block Placement - Ignore Locality for First Block Replica
> --------------------------------------------------------------
>                 Key: HDFS-13448
>                 URL: https://issues.apache.org/jira/browse/HDFS-13448
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: block placement, hdfs-client
>    Affects Versions: 2.9.0, 3.0.1
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Minor
>         Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, HDFS-13448.12.patch, HDFS-13448.13.patch,
HDFS-13448.6.patch, HDFS-13448.7.patch, HDFS-13448.8.patch
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement request
to not put a block replica on the local datanode _where 'local' means the same host as the
client is being run on._
> {quote}
>   /**
>    * Advise that a block replica NOT be written to the local DataNode where
>    * 'local' means the same host as the client is being run on.
>    *
>    * @see CreateFlag#NO_LOCAL_WRITE
>    */
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that the first
block replica be placed on a random DataNode in the cluster.  The subsequent block replicas
should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block replica is
not placed on the local node, but it is still placed on the local rack.  Where this comes
into play is where you have, for example, a flume agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode local to the
Flume agent will always get the first block replica and this leads to un-even block placements,
with the local node always filling up faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the Flume agent
is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then the default block placement
policy will still prefer the local rack.  This remedies the situation only so far as now the
first block replica will always be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks randomly, evenly,
over the entire cluster instead of hot-spotting the local node or the local rack.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message