nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lawrence <slawre...@tresys.com>
Subject NiFI XProc Processor
Date Tue, 07 Mar 2017 15:17:01 GMT
We have developed a NiFi processor that uses XMLCalabash [1] to add
support for XProc [2] processing. XProc is an XML transformation
language that defines and XML pipeline, allowing for complex validation,
transformation, and routing of XML data within the pipeline, using
existing XML technologies such as RelaxNG, Schematron, XSD Schema,
XQuery, XSLT, XPath and custom XProc transformations.

This new processor is mostly straightforward, but we had some questions
regarding the specific implementation and the handling of non-thread
safe code. The code is available for viewing here:


https://opensource.ncsa.illinois.edu/bitbucket/projects/DFDL/repos/nifi-xproc/browse

In this processor, a property is created to provide an XProc file, which
defines the pipeline input and output "ports". XML goes into an input
port, goes through the pipeline, and one or more XML documents exit at
specified output ports. This NiFi processor maps each output port to a
dynamic NiFi relationship. It does this mapping in the
onPropertyModified method when the XProc file property is changed. This
method also stores the XMLCalabash XRuntime and XPipeline objects (which
do all the pipeline work) in volatile member variables to be used later
in onTrigger. The members are saved here to avoid recreating them in
each call to onTrigger. Is this an acceptable place to do that? It seems
this normally happens in an @OnScheduled method or in the first call to
onTrigger, however the objects must be created in onPropertyModified to
get the output ports, so this does avoid recreating the same objects
multiple times. Also note that the same objects are created in the
XML_PIPELINE_VALIDATOR but are not saved due to the validator being
static, so there is already some duplication. Is there a standard way to
avoid duplication/is this an acceptable way to handle this?

The other concern we have is that the XPipeline and XRuntime objects
created by XML Calabash are not thread safe. To resolve this issue, the
processor is annotated with @TriggerSerially. Is this the correct
solution, or is there a some other preferred method. Perhaps ThreadLocal
or a thread safe pool of XPipeline objects is preferred?

Lastly, is this something the devs would be interested in pulling into
NiFI, and if not, what could be changed to achieve this? The code is
licensed as Apache v2 and we would be happy to contribute the code to
NiFi if deemed acceptable.

Thanks,
- Steve

[1] http://xmlcalabash.com/
[2] https://www.w3.org/TR/xproc/

Mime
View raw message