nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alopresto <...@git.apache.org>
Subject [GitHub] nifi pull request #491: NIFI-1960: Update admin guide regarding documentatio...
Date Fri, 03 Jun 2016 18:15:12 GMT
Github user alopresto commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/491#discussion_r65752470
  
    --- Diff: nifi-docs/src/main/asciidoc/administration-guide.adoc ---
    @@ -485,98 +485,137 @@ It is preferable to request upstream/downstream systems to switch
to https://cwi
     Clustering Configuration
     ------------------------
     
    -This section provides a quick overview of NiFi Clustering and instructions on how to
set up a basic cluster. In the future, we hope to provide supplemental documentation that
covers the NiFi Cluster Architecture in depth.
    -
    -The design of NiFi clustering is a simple master/slave model where there is a master
and one or more slaves.
    -While the model is that of master and slave, if the master dies, the slaves are all instructed
to continue operating
    -as they were to ensure the dataflow remains live. The absence of the master simply means
new slaves cannot join the
    -cluster and cluster flow changes cannot occur until the master is restored. In NiFi clustering,
we call the master
    -the NiFi Cluster Manager (NCM), and the slaves are called Nodes. See a full description
of each in the Terminology section below.
    +This section provides a quick overview of NiFi Clustering and instructions on how to
set up a basic cluster.
    +In the future, we hope to provide supplemental documentation that covers the NiFi Cluster
Architecture in depth.
    +
    +NiFi employs a Zero-Master Clustering paradigm. Each of the nodes in the cluster performs
the same tasks on
    +the data but each operates on a different set of data. One of the nodes is automatically
elected (via Apache
    +ZooKeeper) as the Cluster Coordinator. All nodes in the cluster will then send heartbeat/status
information
    +to this node, and this node is responsible for disconnecting nodes that do not report
any heartbeat status
    +for some amount of time. Additionally, when a new node elects to join the cluster, the
new node must first
    +connect to the currently-elected Cluster Coordinator in order to obtain the most up-to-date
flow. If the Cluster
    +Coordinator determines that the node is allowed to join (based on its configured Firewall
file), the current
    +flow is provided to that node, and that node is able to join the cluster, assuming that
the node's copy of the
    +flow matches the copy provided by the Cluster Coordinator. If the node's version of the
flow configuration differs
    +from that of the Cluster Coordinator's, the node will not join the cluster.
     
     *Why Cluster?* +
     
    -NiFi Administrators or Dataflow Managers (DFMs) may find that using one instance of NiFi
on a single server is not enough to process the amount of data they have. So, one solution
is to run the same dataflow on multiple NiFi servers. However, this creates a management problem,
because each time DFMs want to change or update the dataflow, they must make those changes
on each server and then monitor each server individually. By clustering the NiFi servers,
it's possible to have that increased processing capability along with a single interface through
which to make dataflow changes and monitor the dataflow. Clustering allows the DFM to make
each change only once, and that change is then replicated to all the nodes of the cluster.
Through the single interface, the DFM may also monitor the health and status of all the nodes.
    +NiFi Administrators or Dataflow Managers (DFMs) may find that using one instance of NiFi
on a single server is not
    +enough to process the amount of data they have. So, one solution is to run the same dataflow
on multiple NiFi servers.
    +However, this creates a management problem, because each time DFMs want to change or
update the dataflow, they must make
    +those changes on each server and then monitor each server individually. By clustering
the NiFi servers, it's possible to
    +have that increased processing capability along with a single interface through which
to make dataflow changes and monitor
    +the dataflow. Clustering allows the DFM to make each change only once, and that change
is then replicated to all the nodes
    +of the cluster. Through the single interface, the DFM may also monitor the health and
status of all the nodes.
     
     NiFi Clustering is unique and has its own terminology. It's important to understand the
following terms before setting up a cluster.
     
     [template="glossary", id="terminology"]
     *Terminology* +
     
    -*NiFi Cluster Manager*: A NiFi Cluster Manager (NCM) is an instance of NiFi that provides
the sole management point for the cluster. It communicates dataflow changes to the nodes and
receives health and status information from the nodes. It also ensures that a uniform dataflow
is maintained across the cluster.  When DFMs manage a dataflow in a cluster, they do so through
the User Interface of the NCM (i.e., via the URL of the NCM's User Interface). Fundamentally,
the NCM keeps the state of the cluster consistent.
    -
    -*Nodes*: Each cluster is made up of the NCM and one or more nodes. The nodes do the actual
data processing. (The NCM does not process any data; all data runs through the nodes.)  While
nodes are connected to a cluster, the DFM may not access the User Interface for any of the
individual nodes. The User Interface of a node may only be accessed if the node is manually
removed from the cluster.
    -
    -*Primary Node*: Every cluster has one Primary Node. On this node, it is possible to run
"Isolated Processors" (see below). By default, the NCM will elect the first node that connects
to the cluster as the Primary Node; however, the DFM may select a new node as the Primary
Node in the Cluster Management page of the User Interface if desired. If the cluster restarts,
the NCM will "remember" which node was the Primary Node and wait for that node to re-connect
before allowing the DFM to make any changes to the dataflow. The ADMIN may adjust how long
the NCM waits for the Primary Node to reconnect by adjusting the property _nifi.cluster.manager.safemode.duration_
in the _nifi.properties_ file, which is discussed in the <<system_properties>>
section of this document.
    -
    -*Isolated Processors*: In a NiFi cluster, the same dataflow runs on all the nodes. As
a result, every component in the flow runs on every node. However, there may be cases when
the DFM would not want every processor to run on every node. The most common case is when
using a processor that communicates with an external service using a protocol that does not
scale well. For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP
on every node in the cluster tries simultaneously to pull from the same remote directory,
there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary
Node to run in isolation, meaning that it only runs on that node. It could pull in data and
-with the proper dataflow configuration- load-balance it across the rest of the nodes in the
cluster. Note that while this feature exists, it is also very common to simply use a standalone
NiFi instance to pull data and feed it to the cluster. It just depends o
 n the resources available and how the Administrator decides to configure the cluster.
    -
    -*Heartbeats*: The nodes communicate their health and status to the NCM via "heartbeats",
which let the NCM know they are still connected to the cluster and working properly. By default,
the nodes emit heartbeats to the NCM every 5 seconds, and if the NCM does not receive a heartbeat
from a node within 45 seconds, it disconnects the node due to "lack of heartbeat". (The 5-second
and 45-second settings are configurable in the _nifi.properties_ file. See the <<system_properties>>
section of this document for more information.) The reason that the NCM disconnects the node
is because the NCM needs to ensure that every node in the cluster is in sync, and if a node
is not heard from regularly, the NCM cannot be sure it is still in sync with the rest of the
cluster. If, after 45 seconds, the node does send a new heartbeat, the NCM will automatically
reconnect the node to the cluster. Both the disconnection due to lack of heartbeat and the
reconnection once a heartbeat is received are re
 ported to the DFM in the NCM's User Interface.
    +*NiFi Cluster Coordinator*: A NiFi Cluster Cluster Coordinator is the node in a NiFI
cluster that is responsible for carrying out
    +tasks for managing which nodes are allowed in the cluster and providing the most up-to-date
flow to newly joining nodes. When a
    +DataFlow Manager manages a dataflow in a cluster, they are able to do so through the
User Interface of any node in the cluster. Any
    +change made is then replicated to all nodes in the cluster.
    +
    +*Nodes*: Each cluster is made up of one or more nodes. The nodes do the actual data processing.
    +
    +*Primary Node*: Every cluster has one Primary Node. On this node, it is possible to run
"Isolated Processors" (see below).
    +ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from
the cluster for any reason, a new
    +Primary Node will automatically be elected. Users can determine which node is currently
elected as the Primary Node by
    +looking at the Cluster Management page of the User Interface.
    +
    +*Isolated Processors*: In a NiFi cluster, the same dataflow runs on all the nodes. As
a result, every component in the flow
    +runs on every node. However, there may be cases when the DFM would not want every processor
to run on every node. The most
    +common case is when using a processor that communicates with an external service using
a protocol that does not scale well.
    +For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP
Processor runs on every node in the
    +cluster tries simultaneously to pull from the same remote directory, there could be race
conditions. Therefore, the DFM could
    +configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs
on that node. It could pull in data and -
    +with the proper dataflow configuration - load-balance it across the rest of the nodes
in the cluster. Note that while this
    +feature exists, it is also very common to simply use a standalone NiFi instance to pull
data and feed it to the cluster.
    +It just depends on the resources available and how the Administrator decides to configure
the cluster.
    +
    +*Heartbeats*: The nodes communicate their health and status to the currently elected
Cluster Coordinator via "heartbeats",
    +which let the Coordinator know they are still connected to the cluster and working properly.
By default, the nodes emit
    +heartbeats every 5 seconds, and if the Cluster Coordinator does not receive a heartbeat
from a node within 40 seconds, it
    +disconnects the node due to "lack of heartbeat". (The 5-second setting is configurable
in the _nifi.properties_ file.
    +See the <<system_properties>> section of this document for more information.)
The reason that the Cluster Coordinator
    +disconnects the node is because the Coordinator needs to ensure that every node in the
cluster is in sync, and if a node
    +is not heard from regularly, the Coordinator cannot be sure it is still in sync with
the rest of the cluster. If, after
    +40 seconds, the node does send a new heartbeat, the Coordinator will automatically reconnect
the node to the cluster.
    --- End diff --
    
    If the node attempts to reconnect, the Cluster Coordinator still ensures the flows are
in sync, correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message