nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: NiFi will not restart, missing status file message in bootstrap
Date Thu, 25 May 2017 15:05:35 GMT
jim

that provenance warning is not related to archive/retention.  It is
provenance telling you it can only index events so fast and at present
it is falling behind so will slow the flow to ensure things dont get
too far out of balance.  However, there are configuration properties
that let you give provenance indexing more threads.  Also, we created
a new provenance implementation available in niFi 1.2.0 which is
multiple times faster with immediate indexing.

Thanks

On Thu, May 25, 2017 at 11:03 AM, James McMahon <jsmcmahon3@gmail.com> wrote:
> Absolutely. Thank you for looking into this Aldrin.
>
> I do indeed have NiFi configured as a service. I've stopped an started it
> dozens of times through the life of my workflow development these recent
> months. It's always previously started up like a champ. On this particular
> occasion I did this:
> service nifi stop
> as user nifi. It shutdown, and the logs presented no errors.
> I then did this:
> service nifi start
> as user nifi. The bootstrap log contained the INFO messages I shared with
> you above.
>
> Our data flow has not taxed NiFi much at all. There was no data processing
> through at the time. We had recently done two bulk ingests of large data
> directories. The content repo had indicated 46% full, but after I let it sit
> overnight it had dropped back down to a typical level of 3-6%. As I learned
> yesterday, with my archive retention set to 12 hours it explained why I was
> seeing the content repo  hold on to all that capacity after all my 100,000
> files had processed through late yesterday.
>
> Early this morning I modified my conf/nifi.properties to drop my archive
> retention to 1 day from 12 days. This was when I tried and failed to
> restart.
>
> We've since rebooted the host and NiFi came right up. With my new archive
> retention value in place, I tried processing about 16,000 files through.
> They flew through, but I have noticed a Warning that I believe is caused by
> my change to archive retention:   WARNING The rate of the dataflow is
> exceeding the provenance recording rate. Slowing down flow to accommodate.
>
> What else can I tell you? I suppose it would help to mention that my three
> major repos - content, flowfile, provenance - are on separate local disk
> devices.
>
> My workflow load peaks when I try to process approximately 100,000 files
> totaling 50 GB through the flow. The content repo maxes out at 46% of our
> 50GB capacity. The provenance and flowfile repos never peak into the double
> digits. I do some custom parsing and custom logging in
> InvokeScriptedProcessors. I employ HandleHttpResponse and HandleHttpRequests
> processors.
>
> I've not yet watched memory usage on the box as I run, but I'll try to use a
> 'watch -n [#] free -m'  later to see what happens. My nifi instance runs
> with JVM memory parms in bootstrap.conf of -Xms4096m and -Xmx8192m.
>
> Jim
>
> On Thu, May 25, 2017 at 10:38 AM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>
>> If you happen to remember, could you get more specific into your sequence
>> of operations?  Is nifi installed as a service? If so, was it restarted
>> Did you just issue a nifi.sh restart?
>>
>> Do you have any CM tooling (Puppet, Chef, Salt, etc) that is managing this
>> process/system?
>>
>> Could you tell us what the bootstrap log says prior to those lines in
>> terms of shutting down?
>>
>> Would you be able to describe the load exerted on the system by the flow?
>> A bit of an amorphous question, but is/was the system heavily taxed running
>> NiFi?
>>
>> The section you hit _should_ only be hit if NiFi (the flow process and not
>> the bootstrap) terminates for some reason (e.g. - Hit an out of memory
>> case).  I have a few notions as to how the right confluence of events could
>> have gotten you otherwise, so any additional details would be great to vet
>> their possible culpability.
>>
>> Thanks!
>>
>> On Thu, May 25, 2017 at 10:10 AM, James McMahon <jsmcmahon3@gmail.com>
>> wrote:
>>>
>>> I did inspect the log more closely. It offers little additional insight.
>>> Here is what it says (unable to export, had to transcribe myself):
>>>
>>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi Status
>>> File no longer exists. Will not restart NiFi
>>> [date] [time],### INFO [main] o.a.n.b.NotificationServiceManager
>>> Successfully loaded the following 0 services: [ ]
>>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi
>>> Registered no Notification Services for Notification Type NIFI_STARTED
>>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi
>>> Registered no Notification Services for Notification Type NIFI_STOPPED
>>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.RunNiFi
>>> Registered no Notification Services for Notification Type NIFI_DIED
>>> [date] [time],### INFO [main] org.apache.nifi.bootstrap.Command Apache
>>> NiFi is not running
>>>
>>> My hope is that we can figure out what happens to this status file, and
>>> how I can prevent it from nonexistence.
>>>
>>> Jim
>>>
>>> On Thu, May 25, 2017 at 9:37 AM, Joe Witt <joe.witt@gmail.com> wrote:
>>>>
>>>> I don't think rebooting the system had anything to do with NiFi's
>>>> ability to startup.  But i'm not sure I understand that particular
>>>> part of logic in the code in terms of the case it was defending
>>>> against.
>>>>
>>>> On Thu, May 25, 2017 at 9:34 AM, James McMahon <jsmcmahon3@gmail.com>
>>>> wrote:
>>>> > Will do Joe. I'll dig for that now.
>>>> >
>>>> > Infrastructure Group did reboot the box, which had been up and running
>>>> > for
>>>> > nearly two months. NiFi did indeed come up following the reboot. I
>>>> > still
>>>> > want to try and get you this log information so that I can learn what
>>>> > triggers such a situation, and whether there is a more refined way to
>>>> > solve
>>>> > it than full system reboot. There are other things running on the
>>>> > resource
>>>> > and I should try to minimize impact to them by fully rebooting.
>>>> >
>>>> > Let me see about that log content. Thank you again.
>>>> >
>>>> > On Thu, May 25, 2017 at 9:25 AM, Joe Witt <joe.witt@gmail.com>
wrote:
>>>> >>
>>>> >> Jim,
>>>> >>
>>>> >> The code relevant to that log output is here [1].  Can you share
the
>>>> >> bootstrap output before/after that output?
>>>> >>
>>>> >> [1]
>>>> >>
>>>> >> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-bootstrap/src/main/java/org/apache/nifi/bootstrap/RunNiFi.java
>>>> >>
>>>> >> Thanks
>>>> >> Joe
>>>> >>
>>>> >> On Thu, May 25, 2017 at 9:11 AM, James McMahon <jsmcmahon3@gmail.com>
>>>> >> wrote:
>>>> >> > Am running NiFi 0.7.x. Have been running with great stability
for a
>>>> >> > long
>>>> >> > period of time. Tried this morning to make this change in my
>>>> >> > nifi.properties
>>>> >> > conf file:
>>>> >> >
>>>> >> > nifi.content.repository.archive.max.retention.period=1 hour
>>>> >> >
>>>> >> > Reduced from the default of 12 hours. Relatively simple change,
>>>> >> > requires
>>>> >> > a
>>>> >> > nifi restart to take effect.
>>>> >> >
>>>> >> > My restart attempt throws no errors to the nifi app log, but
in the
>>>> >> > bootstrap log I do see this:
>>>> >> > org.apache.nifi.bootstrap.RunNiFi Status file no longer exists.
>>>> >> > Will not
>>>> >> > restart NiFi
>>>> >> >
>>>> >> > I've done some digging and all I could find is rebooting the
box in
>>>> >> > hopes of
>>>> >> > resolving. Am reaching out to the infrastructure group that
owns
>>>> >> > the
>>>> >> > server
>>>> >> > now, asking them to do so. Would like to also in parallel
>>>> >> > understand why
>>>> >> > this happened, and where, exactly, this status file should
be?
>>>> >> >
>>>> >> > Can I resolve this by manually recreating such a status file
with
>>>> >> > certain
>>>> >> > permissions and ownership?
>>>> >> >
>>>> >> > Thanks in advance for your help.  -Jim
>>>> >> >
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>>
>

Mime
View raw message