nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mollitor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-5452) Enable HDFS-13448 in HDFS Sink
Date Wed, 07 Aug 2019 18:01:00 GMT

    [ https://issues.apache.org/jira/browse/NIFI-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902365#comment-16902365
] 

David Mollitor commented on NIFI-5452:
--------------------------------------

Here

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/PutHDFS.java#L316

> Enable HDFS-13448 in HDFS Sink
> ------------------------------
>
>                 Key: NIFI-5452
>                 URL: https://issues.apache.org/jira/browse/NIFI-5452
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: David Mollitor
>            Priority: Major
>
> Now that [HDFS-13448] is available, add a new boolean configuration to the HDFS Sink
configuration that enabled this.
> The basic issue is, as it currently stands, is the following:
> Imagine a cluster has four racks of hardware
> # Rack A is half management nodes and half datanodes
> # Rack B, C, D are all datanodes
> Now consider the following scenarios:
> If an instance of NiFi is located on a server outside of these racks, the data will be
evenly distributed to each DataNode.
> If an instance of NiFi is running on Rack A, and is running co-located with a DataNode,
then all of the HDFS Sink writes will first go to the local DataNode, thus overloading this
single DataNode and filling it faster than all other DataNodes in the cluster.
> If an instance of NiFi is running on Rack A, on its own server, then all of the HDFS
Sink writes will first go to a DataNode on Rack A, thus overloading the DataNodes on Rack
A and filling those DataNodes faster than all other DataNodes in the cluster.  The issue here
is compounded using many racks.  Rack A will always receive one copy of the each block, and
the other two copies are scattered equally across the other racks.
> [HDFS-13448] adds a new flag to the HDFS client that requests to the NameNode that the
first block should always be randomly placed.  Thus, if a NiFi instance is located on Rack
A, the local node (or local rack) will not be overloaded.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message