nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Payne (JIRA)" <>
Subject [jira] [Commented] (NIFI-238) Add processors to write datasets using Kite
Date Tue, 13 Jan 2015 20:29:35 GMT


Mark Payne commented on NIFI-238:


Can you explain what you mean by "I'm not clear on why I would use some method calls?"

I'll try to clear up some things here without going into as much detail as you would see in
a developer guide, which we're working on.

The difference between ProcessContext and ProcessSession, in a nut shell is the the session
provides access to data (FlowFiles), while the context provides information about the environment
and configuration (processor properties, etc.)

context.yield() is a way to indicate that there's nothing useful that a Processor can do,
so it should not be triggered to run for a bit. For example, if you are pulling from an external
source and you know there's no data, you can call context.yield() to have the framework essentially
"pause" your Processor so that you don't abuse the remote resource by continually asking for
data. The amount of time that the Processor is "paused" is controlled in the Processor configuration
dialog ("Yield Duration") with a default of 1 second.

Error handling is definitely something we want to address in the developer guide. Generally,
calls to session.write and will be surrounded in a try/catch where you catch
ProcessException. Any IOException that is thrown by your callback will be wrapped in a ProcessException,
and this is often what you're wanting to catch. If any Exception (really any Throwable) escapes
your onTrigger method, the framework will roll back the session. If that Throwable is not
an instance of ProcessException, it will also "administratively yield" your Processor. This
is done because if you let something escape other than ProcessException, it's assumed to be
a bug and this can sometimes lead to Processors consuming large amounts of resources without
accomplishing anything (do a bunch of work, then throw an Exception, rollback, and repeat).
So in this case we at least prevent it from completely consuming your resources.

Regarding backpressure: After you draw a connection between two processors, you can right-click
on the connection and click Configure. There, you can configure a backpressure threshold in
terms of number of FlowFiles and/or size of FlowFiles in the queue. Once this value is reached,
the source of the connection will no longer be triggered to run until the queue drops back
down below this threshold. This is a "soft limit." I.e., if the source of the connection is
a Processor that generates 1000 FlowFiles, and the connection is almost full, it will still
put all 1000 FlowFiles onto the connection's queue, but it will then stop being triggered
for a while.

The user-guide has an explanation of the scheduling:

There's a "Scheduling Tab" section that is a sub-section of the "Configuring a Processor"

Hopefully this clears some things up instead of muddying the waters. Fire back with any other
questions or if there's something that isn't clear here...

> Add processors to write datasets using Kite
> -------------------------------------------
>                 Key: NIFI-238
>                 URL:
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Ryan Blue
> I think it would be great to have a set of processors that parse incoming flow files
and add the data to Kite datasets.

This message was sent by Atlassian JIRA

View raw message