nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Payne (Jira)" <>
Subject [jira] [Created] (NIFI-7940) Create a ScriptedPartitionRecord
Date Thu, 22 Oct 2020 13:54:00 GMT
Mark Payne created NIFI-7940:

             Summary: Create a ScriptedPartitionRecord
                 Key: NIFI-7940
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
            Reporter: Mark Payne

In 1.12.0, we introduced the ScriptedTransformRecord. This has worked very well for many different
transformations that are simple in code but very difficult with the DSL's that NiFi supports.
In addition to transforming records, another use case that can be made dramatically easier
with scripting is partitioning records.

The PartitionRecord processor is very powerful and easy to use, but RecordPath is somewhat
limited in the functions that it provides. For example, recently in the Apache Slack channel,
we had someone asking about how to route data based on whether or not the "timestamp" field
matches a given regular expression. Attempts were made using QueryRecord with RPATH but that
didn't work because the timestamp field is a top-level field, not a Record. Tried using UpdateRecord
but that failed because the matchesRegex function of RecordPath is a predicate so can't be
used to partition on. Eventually a pattern was found with QueryRecord using the `SIMILAR TO`
but that function does not support for regular expressions.

A Scripted processor would likely make this far more trivial to handle. This processor should
be focused around making it dead simple to partition (and subsequently route based on added
attributes) records with a scripting language. So, it will be important, like ScriptedTransformRecord,
to make the processor geared more toward ease of use than being given the full power of FlowFiles,
sessions, etc.

The script should have the same bindings as ScriptedTransformRecord:
 * attributes
 * log
 * record
 * recordIndex

Unlike ScriptedTransformRecord, though, the ScriptedPartitionRecord should return one of three
 * A string (or a primitive value such as an int, that can be turned into a String) representing
the partition for the Record
 * A collection of strings/primitives representing multiple partitions that the Record should
go to (indicating that the Record should be added to multiple Record Writers)
 * A null value or an empty collection indicating that the Record should be dropped.

The processor should then write the Record to a FlowFile for each of the Partitions returned.
For each outbound FlowFile, an attribute should be added indicating the partition for that
FlowFile for easy follow-on routing via RouteOnAttribute, etc.

The processor should keep a counter for how many Records were dropped.

The processor should keep counters for how many Records were routed to each Partition.

The processor should include additionalDetails.html to provide sufficient documentation. Similar
to ScriptedTransformRecord, the documentation should include several examples, spanning at
least Groovy and Python (since those are by far the most often used languages we see used
in script processors). Each example provided in the additionalDetails.html should also have
an accompanying unit test to verify the behavior.

This message was sent by Atlassian Jira

View raw message