nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Zhurakousky <ozhurakou...@hortonworks.com>
Subject Re: NiFI XProc Processor
Date Tue, 07 Mar 2017 15:39:07 GMT
Steve

Thank you very much for desire to contribute and such a detailed explanation of your contribution.
I left some comments in line [OLEG], so let us know what you think.

Cheers
Oleg

> On Mar 7, 2017, at 10:17 AM, Steve Lawrence <slawrence@tresys.com> wrote:
> 
> We have developed a NiFi processor that uses XMLCalabash [1] to add
> support for XProc [2] processing. XProc is an XML transformation
> language that defines and XML pipeline, allowing for complex validation,
> transformation, and routing of XML data within the pipeline, using
> existing XML technologies such as RelaxNG, Schematron, XSD Schema,
> XQuery, XSLT, XPath and custom XProc transformations.
> 
> This new processor is mostly straightforward, but we had some questions
> regarding the specific implementation and the handling of non-thread
> safe code. The code is available for viewing here:
> 
> 
> https://opensource.ncsa.illinois.edu/bitbucket/projects/DFDL/repos/nifi-xproc/browse
> 
> In this processor, a property is created to provide an XProc file, which
> defines the pipeline input and output "ports". XML goes into an input
> port, goes through the pipeline, and one or more XML documents exit at
> specified output ports. This NiFi processor maps each output port to a
> dynamic NiFi relationship. It does this mapping in the
> onPropertyModified method when the XProc file property is changed. This
> method also stores the XMLCalabash XRuntime and XPipeline objects (which
> do all the pipeline work) in volatile member variables to be used later
> in onTrigger. The members are saved here to avoid recreating them in
> each call to onTrigger. Is this an acceptable place to do that? It seems
> this normally happens in an @OnScheduled method or in the first call to
> onTrigger, however the objects must be created in onPropertyModified to
> get the output ports, so this does avoid recreating the same objects
> multiple times.
[OLEG] Without getting into more details, both approaches are acceptable. However assigning
values in onTrigger()in certain cases is more preferable. Those cases primarily deal with
obtaining references to a remote resource (i.e., connection factory, socket etc) and for those
cases exception handling is much simpler. I can definitely elaborate further if need to and
point to a few examples where we do that, but it appears that it is not the case for you,
so your current approach seems acceptable. And as far as multi-threading for onTrigger(),
such assignments are done in a typical synchronized block with null check.

> Also note that the same objects are created in the
> XML_PIPELINE_VALIDATOR but are not saved due to the validator being
> static, so there is already some duplication. Is there a standard way to
> avoid duplication/is this an acceptable way to handle this?

[OLEG] Not fully understand the question, but keep in mind that regardless of the amount of
threads, there is only one instance of the processor at any given time, so any reference held
by such instance is essentially a singleton as well. Does that help?
> 
> The other concern we have is that the XPipeline and XRuntime objects
> created by XML Calabash are not thread safe. To resolve this issue, the
> processor is annotated with @TriggerSerially. Is this the correct
> solution, or is there a some other preferred method. Perhaps ThreadLocal
> or a thread safe pool of XPipeline objects is preferred?

[OLEG] Definitely not thread local since there is no guarantee that you will get the same
thread or a particular thread on subsequent invocation. The @TriggerSerially is obviously
the most defensive way to avoid collisions. That said I probably need to better understand
the issue. However off the top of my head one way of ensuring the correctness for such scenarios
is to maintain a Map of such objects as an instance variable (like a pool) where key is something
that would ensure that you always get the correct object.
> 
> 
> Lastly, is this something the devs would be interested in pulling in
> NiFI, and if not, what could be changed to achieve this? The code is
> licensed as Apache v2 and we would be happy to contribute the code to
> NiFi if deemed acceptable.

[OLEG] This is probably the most difficult question to answer since immediate answer is we
don’t know ;) Only the community can decide. So what I would suggest is to raise a JIRA
- https://issues.apache.org/jira/browse/NIFI and submit a PR for it and see if it gets any
traction. Further more we are currently working on the concept of the Extension/Artifact Registry
to accommodate growing request for more NiFi components. 
> 
> Thanks,
> - Steve
> 
> [1] http://xmlcalabash.com/
> [2] https://www.w3.org/TR/xproc/
> 

Mime
View raw message