nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Gilman (JIRA)" <>
Subject [jira] [Commented] (NIFI-926) Clearing counters can cause a Node to disconnect
Date Sat, 05 Sep 2015 16:56:45 GMT


Matt Gilman commented on NIFI-926:

Copy of original email to dev mailing list which has details on how to replicate the issue:


I have a three machine setup (1 NCM + 2 Nodes) running 0.2.0-incubating and observed the following:

1.       Resetting counters can result in the MCN disconnecting a node

2.       The node that is disconnected begins processing FlowFiles


My clustered NiFi is running a single pipeline containing 3 processors. While the pipeline
is running, resetting counters will result in any nodes which are not processing anything
(i.e. are not contributing to the count) to disconnect. The node can then be reconnected via
the UI. Looking at the stats it appears the pipeline then began running on the disconnected
node, as well as the single remaining connected node. This has been tested using custom processors
as well as standard processors.

Steps to Replicate:

1.       Create cluster with 2 nodes + 1 MCN (2 nodes for processing are needed or the problem
won't appear)

2.       Add GenerateFlowFile processor:

a.       Scheduling: Change Scheduling strategy to 'On primary node'

b.      Properties: Change File Size to '10B' (say)

3.       Add HashAttribute processor:

a.       Properties: Change Key to 'hash.value'

4.       Add DetectDuplicate processor:

a.       Properties: Under Distributed Cache Service add a 'DistributedMapCacheClientService'

                                                               i.      For the Client Service
Add Server name to 'localhost' under properties

                                                             ii.      Enable The Client Service

                                                            iii.      Add a DistrubtedMapCacheServer
under the Controller Services

                                                           iv.      Enable the Cache Server

                                                             v.      Exit NiFi Flow Settings

5.       Connect all 3 processors on success

6.       Auto-terminate all options for DetectDuplicate

7.       Run all processors and wait for ~10seconds or so

8.       Open counters tab and refresh to make sure counters > 0

9.       Reset one of the counters

Note: I'm specifically using the DetectDuplicate processor in this example because it contains
a custom counter.

This should then disconnect the node that was not active (node that was not selected to be
the primary). Even though the GenerateFlowFile processor is scheduled to run on the primary
node the disconnected node begins to emit FlowFiles.

The following Warning was pulled from the MCNs logs:

2015-09-02 10:40:16,750 WARN [NiFi Web Server-149] o.a.n.c.manager.impl.WebClusterManager
One or more nodes failed to process URI 'http://localhost:8082/nifi-api/controller/counters/2207ea22-0d4a-389d-b746-82e568c6228d'.
 Requesting each node to disconnect from cluster.

I'm interested in knowing if this is expected behaviour or if I should open a JIRA ticket
(2 perhaps).


> Clearing counters can cause a Node to disconnect
> ------------------------------------------------
>                 Key: NIFI-926
>                 URL:
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Matt Gilman
>            Assignee: Matt Gilman
>             Fix For: 0.3.0
> When clustered, clearing a counter can cause the a node to disconnect. The node disconnects
due to a 404 - Resource Not Found. It appears the counters do not share a common identifier.
This leads to the following message:
> {code}
> 2015-09-03 14:20:15,574 INFO [NiFi Web Server-24] o.a.n.w.a.c.ResourceNotFoundExceptionMapper
org.apache.nifi.web.ResourceNotFoundException: Unable to find Counter with id 'f4cf3c66-c0e7-3439-be94-2096f7d4a78e'..
Returning Not Found response.
> {code}

This message was sent by Atlassian JIRA

View raw message