nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Giovanni Lanzani" <giovannilanz...@godatadriven.com>
Subject Re: A bag of groovy questions regarding the ExecuteScript processor
Date Wed, 04 Oct 2017 07:18:21 GMT
Hi Andy,

That's very helpful, thanks! Inline my comments, waiting for Matt to 
come home :)

On 3 Oct 2017, at 22:44, Andy LoPresto wrote:

> Giovanni,
>
> A lot of great questions here. I’ll try to go through them but I 
> hope Matt weighs in as well (he is on vacation for the next few days 
> though).
>
> * The only time I am aware the Jars are reloaded is at processor 
> restart (I believe this is the same for the script content if defined 
> by a referenced file as well). The scriptingComponentHelper setup*() 
> methods execute inside ExecuteScript#setup(), which has @OnScheduled 
> annotation [1].

Is there anyone that has written sort of script (I don't know if it is 
possible) to query the NiFi API for all the (Groovy ExecuteScript) 
processors using a particular module directory (we plan to use a single 
one for everything), so that I could add a new step, after the shadowJar 
deployment, that restarts all of them?

I imagine this would be a fairly common use case. We're I'm currently 
working we have the following workflow:

- Have a single jar with all the code that the groovy scripts will need;
- The groovy scripts will use that code with minimal boilerplate around 
it, so all the (non-NiFi) related code is in the jar. This makes it very 
easy to test the logic in the jar. We added some extra code to ensure 
the functions that the groovy scripts will call are "NiFi compatible" 
(right now it's just `.getBytes(StandardCharsets.UTF_8)`) We don't use 
Matt framework because we need incoming flowFile to have attributes, and 
I couldn't figure out how to do it :)
- NiFi has a flow to fetch new master updates on the repo and compile 
the (fat) jar as a result. However we would need to restart the 
ExecuteScript processors by hand and... no/no? :) A script would help 
greatly here (if nobody has one, I will dig into the API to see what's 
possible. I might just parse the whole xml file if there's no way to do 
so via the API;

> * I’m not sure how other users bundle their dependencies, but shadow 
> Jars would be fine for this use case, and Matt has referenced using 
> them in his script-tester article [2].
> * Yes, while there are small idiosyncrasies with each language flavor, 
> the NiFi-related domain is fairly consistent. In this case, iterating 
> over a number of flowfiles for processing in a single Groovy script is 
> fine. Session.get(int) [3] is delegated to ProcessSession and returns 
> List<FlowFile>, so you can use any of the Groovy collections methods 
> over it.

So what happens in this case

```
def n = 0
session.get(N).each{ flowFile ->
if(n ==0) {
//do something
} else {
throw Exception
}
session.transfer(flowFile, REL_SUCCESS)
n += 1
}
```
Will the first `flowFile` be successfully transferred or will a rollback 
happen? (Note: I usually wrap the logic in `try/catch` and then, based 
on the result, transfer the file to `REL_SUCCESS`/`REL_FAILURE`

Thanks again,

Giovanni

>
> Hopefully this helps you and if Matt or anyone else sees a mistake, 
> they correct it and add their thoughts. Thanks.
>
> [1] 
> https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#onscheduled
> [2] 
> https://funnifi.blogspot.com/2016/06/testing-executescript-processor-scripts.html 
> <https://funnifi.blogspot.com/2016/06/testing-executescript-processor-scripts.html>
> [3] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L1520

> <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L1520>
>
>
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>> On Oct 3, 2017, at 1:09 PM, Giovanni Lanzani 
>> <giovannilanzani@godatadriven.com> wrote:
>>
>> I apologize if this is specified elsewhere, but I couldn't find it.
>>
>> I was wondering when the jars, used by a particular Groovy script (in 
>> the ExecuteScript processor), are reloaded. I.e. if one jar is 
>> updated, when will the script pick up the new version? I know that 
>> upon restarting the processor, the updated jar is considered, but I 
>> was wondering in which other occasions that happens;
>> Do people tend to use fat (shadow) jars for this sort of jars 
>> referenced by groovy scripts? I don't think it makes sense to keep 
>> track of all the dependencies manually otherwise;
>> When using the {P,J}ython processor, I read Matt advice to use the 
>> following construct in the script:
>> for flowFile in session.get(N):
>>     if flowFile:
>>        # do your thing here
>> Does the same hold for Groovy, i.e. should someone do
>>
>> session.get(N).each{ flowFile ->
>> // do your thing here
>> if(condition) {
>> session.transfer(flowFile, REL_SUCCESS)
>> } else {
>> session.transfer(flowFile, REL_FAILURE)}
>>
>> }
>> Is this approach safe in groovy inside a each? Or is this approach 
>> not needed at all in Groovy, while it is needed in {P,J}ython?
>>
>> Thanks in advance!
>>
>> Giovanni
>>



Mime
View raw message