nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Villard <pierre.villard...@gmail.com>
Subject Re: Restarting NiFi causing SiteToSiteBulletinReportingTask to fail
Date Tue, 17 Apr 2018 22:04:02 GMT
Hi Chad,

I confirm that I can reproduce the issue on my side with a NiFi 1.5.0
cluster and I don't see anything that would fix it in NiFi 1.6.0.

I had a closer look and it does not seem related to the Site-to-Site
mechanism: the thread in charge of refreshing the peers is correctly
running and you should see logs like "Successfully refreshed Peer Status;
remote instance consists of X peers".

As far as I can see, it sounds related to how we are caching the ID of the
last bulletin sent and how we retrieve this value to "restart" the task
after the NiFi node restarted. That's why you have to delete the task and
create it again: it'll delete the associated cache.

That's just an assumption after a quick look, I'll keep digging tomorrow
and open a JIRA for that.

Thanks for reporting it!

Pierre


2018-04-12 23:41 GMT+02:00 Pierre Villard <pierre.villard.fr@gmail.com>:

> Hi Chad,
>
> I believe this could have been fixed recently but I've very limited access
> right now (and for the next few days) and can't be sure...
> I will check next week if no one gave you feedbacks before.
>
> Pierre
>
> 2018-04-12 19:57 GMT+02:00 Woodhead, Chad <Chad.Woodhead@ncr.com>:
>
>> I am running HDF 3.0.1.1 which comes with NiFi 1.2.0.3.0.1.1-5. We are
>> using SiteToSiteBulletinReportingTask to monitor bulletins (for things
>> like Disk Usage and Memory Usage). When we restart NiFi via Ambari (either
>> with a Restart or Stop and then Start), when NiFi comes back up the
>> SiteToSiteBulletinReportingTask no longer works. It throws the following
>> error when it is first trying to start up:
>>
>> SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000-0000-00003ccd7573]
>> org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh
>> Remote Group's peers due to response code 409:Conflict with explanation:
>> null
>>
>> No matter how long we wait, it never works. The ways I have been able to
>> get it to start working again are as follows:
>>
>>   *   Stop and then Start the Remote Input Port the
>> SiteToSiteBulletinReportingTask is using
>>   *   Delete the SiteToSiteBulletinReportingTask and create a new one
>>   *   Wait a while and stop and start the SiteToSiteBulletinReportingTask
>> (however this doesn't work consistently)
>>
>> I have tested the same flow steps using a process that uses a Remote
>> Process Group and a different Remote Input Port, and that RPG throws the
>> same error when first coming up but then starts working after a period of
>> time. So maybe the SiteToSiteBulletinReportingTask isn't trying enough
>> times to connect to the Remote Input Port?
>>
>> Sincerely,
>> Chad Woodhead
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message