nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 尹文才 <batman...@gmail.com>
Subject Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Date Mon, 25 Dec 2017 06:31:33 GMT
Hi Koji, thanks for your help, for the first issue, I will switch to use
the WriteAheadProvenanceReopsitory implementation.

For the second issue, I have uploaded the relevant part of my log file onto
my google drive, the link is:
https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj

You mean a custom processor could possibly process a flowfile twice only
when it's trying to commit the session but it's interrupted so the flowfile
still remains inside the original queue(like NIFI went down)?

If you need to see the full log file, please let me know, thanks.

Regards,
Ben

2017-12-25 13:51 GMT+08:00 Koji Kawamura <ijokarumawak@gmail.com>:

> Hi Ben,
>
> For your 2nd issue, NiFi commits a process session in Processor
> onTrigger when it's executed by NiFi flow engine by calling
> session.commit().
> https://github.com/apache/nifi/blob/master/nifi-api/src/
> main/java/org/apache/nifi/processor/AbstractProcessor.java#L28
> Once a process session is committed, the FlowFile state (including
> which queue it is in) is persisted to disk.
>
> It's possible for a Processor to process the same FlowFile more than
> once, if it has done its job, but failed to commit the session.
> For example, if your custom processor created a temp table from a
> FlowFile. Then before the process session is committed, something
> happened and NiFi process session was rollback. In this case, the
> target database is already updated (the temp table is created), but
> NiFi FlowFile stays in the incoming queue. If the FlowFile is
> processed again, the processor will get an error indicating the table
> already exists.
>
> I tried to look at the logs you attached, but attachments do not seem
> to be delivered to this ML. I don't see anything attached.
>
> Thanks,
> Koji
>
>
> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura <ijokarumawak@gmail.com>
> wrote:
> > Hi Ben,
> >
> > Just a quick recommendation for your first issue, 'The rate of the
> > dataflow is exceeding the provenance recording rate' warning message.
> > I'd recommend using WriteAheadProvenanceRepository instead of
> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
> > provides better performance.
> > Please take a look at the documentation here.
> > https://nifi.apache.org/docs/nifi-docs/html/administration-
> guide.html#provenance-repository
> >
> > Thanks,
> > Koji
> >
> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 <batman713@gmail.com> wrote:
> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
> >> encountered 2 problems during my testing.
> >>
> >> The first problem is I found the nifi bulletin board was showing the
> >> following warning to me:
> >>
> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
> exceeding
> >> the provenance recording rate. Slowing down flow to accommodate.
> Currently,
> >> there are 96 journal files (158278228 bytes) and threshold for blocking
> is
> >> 80 (1181116006 bytes)
> >>
> >> I don't quite understand what this means, and I found also inside the
> >> bootstrap log that nifi restarted itself:
> >>
> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi
> Apache
> >> NiFi appears to have died. Restarting...
> >>
> >> Is there anything I could do so solve this problem?
> >>
> >> The second problem is about the FlowFiles inside my flow, I actually
> >> implemented a few custom processors to do the ETL work. one is to
> extract
> >> multiple tables from sql server and for each flowfile out of it, it
> contains
> >> an attribute
> >> specifying the name of the temp ods table to create, and the second
> >> processor is to get all flowfiles from the first processor and create
> all
> >> the temp ods tables specified in the flowfiles' attribute.
> >> I found inside the app log that one of the temp table name already
> existed
> >> when trying to create the temp table, and it caused sql exception.
> >> After taking some time investigating in the log, I found the sql query
> was
> >> executed twice in the second processor, once before nifi restart, the
> second
> >> execution was done right after nifi restart:
> >>
> >> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
> >> c.z.nifi.processors.ExecuteSqlCommand
> >> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14]
> 执行sql语句失败:SELECT
> >> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
> >> dbo.ods_bd_e_reason;
> >>
> >>
> >> I have read the document of nifi in depth but I'm still not very aware
> of
> >> nifi's internal mechanism, my suspect is nifi didn't manage to
> checkpoint
> >> the flowfile's state(which queue it was in) in memory into flowfile
> >> repository
> >> before it was dead and after restarting it recovered the flowfile's
> state
> >> from flowfile repository and then the flowfile went through the second
> >> processor again and thus the sql was executed twice. Is this correct?
> >>
> >> I've attached the relevant part of app log, thanks.
> >>
> >> Regards,
> >> Ben
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message