nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Nifi 1.1.0 cluster on Docker Swarm
Date Wed, 22 Mar 2017 19:53:54 GMT
I was looking into this a little bit and although I don't have a
definitive answer, here is what I determined...

There are basically two pairs of values being passed around...

1) The cluster socket address and cluster socket port
(nifi.cluster.node.address and nifi.cluster.node.protocol.port) which
are used for the clustering protocol

2) The node API address and node API port which is the web
address/port to use when web-requests are being replicated across the
cluster

There is a "NodeId" class that each node creates during start up that
has both of these sets of values in it.

The logs you are seeing about "Vote cast by {}..." happen to be
logging the toString() method of NodeId which is the following:

public String toString() {
    return apiAddress + ":" + apiPort;
}

So even though the cluster address/port was used for joining the
cluster, the logs are just showing the other pair, which is why your
cluster still comes up successfully.

Then when you hit the web UI it attempts to replicate the request from
the node you are on to the other node using the API address/port of
the other node.

The API address/port is calculated by a node during start up by
looking using the web host/port properties:

nifi.web.http.host=
nifi.web.http.port=
nifi.web.https.host=
nifi.web.https.port=

The logic is basically this:

if nifi.cluster.protocol.is.secure is true then
  scheme is https
else
  scheme is http

if scheme is http then
  if nifi.web.http.host is empty
    use localhost for API host
  else
    use nifi.web.http.host
else scheme is https then
  if nifi.web.https.host is empty
    use localhost for API host
  else
    use nifi.web.https.host

This explains why leaving nifi.web.http.host blank will result in
calculating the web API address as localhost since it has no other
option.

I think the bottom line is figuring out why was it a requirement to
leave nifi.web.http.host blank in the first place, shouldn't that be
set to the same hostnames (stack1_nifi1,  stack1_nifi2)?

Also, I am working off master and we now have
nifi.web.http.network.interface.default= and
nifi.web.https.network.interface.default=  in addition to the existing
properties, and I haven't figured out how these are related to any of
the above. They only seem to be used by the JettyServer when
determining what addresses to bind to, but are never referenced when
determining the web API address for clustering.


-Bryan


On Fri, Mar 17, 2017 at 8:09 AM, ddewaele <ddewaele@gmail.com> wrote:
> Hi Jeremy,
>
> The issue we are facing is that we need to keep the nifi.web.http.host blank
> in order to have a working swarm setup, but this conflicts with the way nifi
> does cluster communication. Let me try to explain:
>
> I have 2 nifi instances (cluster nodes) in a docker swarm connected to
> zookeeper (also running in the docker swarm).
>
> - stack1_nifi1 running on port 8080 on centos-a
> - stack1_nifi2 running on port 8085 on centos-b
>
> (stack1_nifi1 and stack1_nifi2 are swarm service names and are made
> available in the docker network via DNS).
>
> My Nifi config :
>
> # Leave blank so that it binds to all possible interfaces
> nifi.web.http.host=
> nifi.web.http.port=8080  #(8085 on the other node)
>
> nifi.cluster.is.node=true
> # Define the cluster node (hostname) address to uniquely identify this node.
> nifi.cluster.node.address=stack1_nifi1 #(stack1_nifi2 on the other node)
> nifi.cluster.node.protocol.port=10001
>
>
> In the NiFi logs I notice this :
>
> 2017-03-17 11:44:45,298 INFO [main]
> o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster
> Coordinator is located at stack1_nifi2:10001; will use this address for
> sending heartbeat messages
> 2017-03-17 11:44:45,433 INFO [Process Cluster Protocol Request-1]
> o.a.n.c.c.flow.PopularVoteFlowElection Vote cast by localhost:8085; this
> flow now has 1 votes
>
> In the first line the cluster node address is used, but in the second one it
> seems the nifi.web.http.host is used. So the nodeIds are not using the
> nifi.cluster.node.address, but seem to default to the empty
> nifi.web.http.host entry (defaults to localhost).
>
>
> Same thing can be seen here:
>
> 2017-03-17 11:44:50,517 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator
> Resetting cluster node statuses from
> {localhost:8080=NodeConnectionStatus[nodeId=localhost:8080,
> state=CONNECTING, updateId=3],
> localhost:8085=NodeConnectionStatus[nodeId=localhost:8085, state=CONNECTING,
> updateId=5]} to {localhost:8080=NodeConnectionStatus[nodeId=localhost:8080,
> state=CONNECTING, updateId=3],
> localhost:8085=NodeConnectionStatus[nodeId=localhost:8085, state=CONNECTING,
> updateId=5]}
>
> Shouldn't Nifi always use the nifi.cluster.node.address to generate the
> nodeIds ?
>
> It should also use that setting to send replication requests I guess :
>
> 2017-03-10 06:03:59,014 WARN [Replicate Request Thread-7]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET
> /nifi-api/flow/current-user to localhost:8085 due to {}
>
> Because my Nifi cluster seems to be up and running (I see heartbeats going
> back and forth), but I cannot access the UI due the replicate error above.
>
> The nifi running on centos-a:8080 is trying to do a request to
> localhost:8085 where it should go to centos-b:8085. (in order to that it
> should use the nifi.cluster.node.address).
>
>
>
>
>
> Jeremy Dyer wrote
>> Raf - Ok so good news and bad news. Good news its working for me. Bad news
>> its working for me =) Here is the complete list of things that I changed.
>> Hopefully this can at least really help narrow down what is causing the
>> issue.
>>
>> - I ran on a single machine. All that was available to me while at the
>> airport.
>> - I added a "network" section to the end of the docker-compose.yml file. I
>> think you might already have that and this was just a snippet in your
>> gist?
>> - I removed the COPY from the Dockerfile around the custom processors
>> since
>> I don't have those.
>>
>> In my mind the most likely issue is something around Docker swarm
>> networking.
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-users-list.2361937.n4.nabble.com/Nifi-1-1-0-cluster-on-Docker-Swarm-tp1229p1266.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

Mime
View raw message