Return-Path: Generates and executes a SQL select query to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to Avro format. Expression Language is supported for several properties, but no incoming connections are permitted. The Variable Registry may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for t
his purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected. sql, select, jdbc, query, database In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language. Dynamic Properties allow the user to specify both the name and value of a property.QueryDatabaseTable
Description:
Tags:
Properties:
Name Default Value Allowable Values Description Database Connection Pooling Service Controller
Service API:
DBCPService
Implementations: DBCPConnectionPool
HiveConnectionPoolThe Controller Service that is used to obtain a connection to the database. Database Type Generic The type/flavor of database, used for generating database-specific code. In many cases the Generic type should suffice, but some databases (such as Oracle) require custom SQL clauses. Table Name The name of the database table to be queried.
Supports Expression Language: trueColumns to Return <
/td> A comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned. NOTE: It is important to use consistent column names for a given table for incremental fetch to work properly.
Supports Expression Language: trueMaximum-value Columns A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is
usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly.
Supports Expression Language: trueMax Wait Time 0 seconds The maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero.
Supports Expression Language: true
Fetch Size 0 The number of result rows to be fetched from the result set at a time. This is a hint to the driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored.
Supports Expression Language: trueMax Rows Per Flow File 0 The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile.
Supports Expression Language: trueMaximum Number of Fragments 0 T
he maximum number of fragments. If the value specified is zero, then all fragments are returned. This prevents OutOfMemoryError when this processor ingests huge table.
Supports Expression Language: trueNormalize Table/Column Names false Whether to change non-Avro-compatible characters in column names to Avro-compatible characters. For example, colons and periods will be changed to underscores in order to build a valid Avro record. Use Avro Logical Types false Whether to use Avro Logical Types for DECIMAL/NUMBER, DATE, TIME and TIMESTAMP columns. If disabled, written as string. If enabled, Logical types are used and written as its underlying
type, specifically, DECIMAL/NUMBER as logical 'decimal': written as bytes with additional precision and scale meta data, DATE as logical 'date-millis': written as int denoting days since Unix epoch (1970-01-01), TIME as logical 'time-millis': written as int denoting milliseconds since Unix epoch, and TIMESTAMP as logical 'timestamp-millis': written as long denoting milliseconds since Unix epoch. If a reader of written Avro records also knows these logical types, then these values can be deserialized with more context depending on reader implementation. Default Decimal Precision 10 When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'precision' denoting number of available digits is required. Generally, precision is defined by column data type definition or database engines default. However undefined precision (0) can be return
ed from some database engines. 'Default Decimal Precision' is used when writing those undefined precision numbers.
Supports Expression Language: trueDefault Decimal Scale 0 When a DECIMAL/NUMBER value is written as a 'decimal' Avro logical type, a specific 'scale' denoting number of available decimal digits is required. Generally, scale is defined by column data type definition or database engines default. However when undefined precision (0) is returned, scale can also be uncertain with some database engines. 'Default Decimal Scale' is used when writing those undefined numbers. If a value has more decimals than specified scale, then the value will be rounded-up, e.g. 1.53 becomes 2 with scale 0, and 1.5 with scale 1.
Supports Expression Language: trueAdditional WHERE clause A custom clause to be added in the WHERE condition when generating SQL requests.
Supports Expression Language: trueDynamic Properties:
Name Value Description Initial Max Value Attribute Expression Language Specifies an initial max value for max value columns. Properties should be added in the format `initial.maxvalue.{max_value_column}`.
Name | Description |
---|---|
success | Successfully created FlowFile from SQL query result set. |
Name | Description |
---|---|
tablename | Name of the table being queried |
querydbtable.row.count | The number of rows selected by the query |
fragment.identifier | If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results. |
fragment.count | If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. |
fragment.index | If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles ori ginated from the same query result set and in what order FlowFiles were produced |
maxvalue.* | Each attribute contains the observed maximum value of a specified 'Maximum-value Column'. The suffix of the attribute is the name of the column |
Scope | Description |
---|---|
CLUSTER | After performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation |
GenerateTableFetch, ExecuteSQL
\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/additionalDetails.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/additionalDetails.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/additionalDetails.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/additionalDetails.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1,48 @@ + + + + + ++ QueryRecord provides users a tremendous amount of power by leveraging an extremely well-known + syntax (SQL) to route, filter, transform, and query data as it traverses the system. In order to + provide the Processor with the maximum amount of flexibility, it is configured with a Controller + Service that is responsible for reading and parsing the incoming FlowFiles and a Controller Service + that is responsible for writing the results out. By using this paradigm, users are not forced to + convert their data from one format to another just to query it, and then transform the data back + into the form that they want. Rather, the appropriate Controller Service can easily be configured + and put to use for the appropriate data format. +
+ ++ Rather than providing a single "SQL SELECT Statement" type of Property, this Processor makes use + of user-defined properties. Each user-defined property that is added to the Processor has a name + that becomes a new Relationship for the Processor and a corresponding SQL query that will be evaluated + against each FlowFile. This allows multiple SQL queries to be run against each FlowFile. +
+ ++ The SQL syntax that is supported by this Processor is ANSI SQL and is powered by Apache Calcite. Please + note that identifiers are quoted using double-quotes, and column names/labels are case-insensitive. +
+ + \ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.QueryRecord/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +Evaluates one or more SQL queries against the contents of a FlowFile. The result of the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering. Columns can be renamed, simple calculations and aggregations performed, etc. The Processor is configured with a Record Reader Controller Service and a Record Writer service so as to allow flexibility in incoming and outgoing data formats. The Processor must be configured with at least one user-defined property. The name of the Property is the Relationship to route data to, and the value of the Property is a SQL SELECT statement that is used to specify how input data should be transformed/filtered. The SQL statement must be valid ANSI SQL and is powered by Apache Calcite. If the transformation fails, the original FlowFile is routed to the 'failure' relationship. Otherwise, the data selected will be routed to the associated relationship. If the Record Writer chooses to inherit the schema from the Record, it is important to note that the schema that is inherited will be from the ResultSet, rather than the input Record. This allows a single instance of the QueryRecord processor to have multiple queries, each of which returns a different set of columns and aggregations. As a result, though, the schema that is derived will have no schema name, so it is important that the configured Record Writer not attempt to write the Schema Name as an attribute if inheriting the Schema from the Record. See the P rocessor Usage documentation for more information.
sql, query, calcite, route, record, transform, select, update, modify, etl, filter, record, csv, json, logs, text, avro, aggregate
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Record Reader | Controller Service API: RecordReaderFactory Implementations: CSVReader GrokReader AvroReader JsonTreeReader JsonPathReader ScriptedReader | Specifies the Controller Service to use for parsing incoming data and determining the data's schema | |
Record Writer | Controller Service API: RecordSetWriterFactory Implementations: JsonRecordSetWriter FreeFormTextRecordSetWriter AvroRecordSetWriter ScriptedRecordSetWriter CSVRecordSetWriter | Specifies the Controller Service to use for writing results to a FlowFile | |
Include Zero Record FlowFiles | true |
| When running the SQL statement against an incom ing FlowFile, if the result has no data, this property specifies whether or not a FlowFile will be sent to the corresponding relationship |
Cache Schema | true |
| Parsing the SQL query and deriving the FlowFile's schema is relatively expensive. If this value is set to true, the Processor will cache these values so that the Processor is much more efficient and much faster. However, if this is done, then the schema that is derived for the first FlowFile processed must apply to all FlowFiles. If all FlowFiles will not have the exact same schema, or if the SQL SELECT statement uses the Expression Language, this value should be set to false. |
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | |
---|---|---|
The name of the relationship to route data to | A SQL SELECT statement that is used to determine what data should be routed to this relationship. | Each user-defined property specifies a SQL SELECT statement to run over the data, with the data that is selected being routed to the relationship whose name is the property name Supports Expression Language: true |
Name | Description |
---|---|
failure | If a FlowFile fails processing for any reason (for example, the SQL statement contains columns not present in input data), the original FlowFile it will be routed to this relationship |
original | The original FlowFile is routed to this relationship |
A Dynamic Relationship may be created based on how the user configures the Processor.
Name | Description |
---|---|
<Property Name> | Each user-defined property defines a new Relationship for this Processor. |
Updates the content of a FlowFile by evaluating a Regular Expression (regex) against it and replacing the section of the content that matches the Regular Expression with some alternate value.
Text, Regular Expression, Update, Change, Replace, Modify, Regex
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Search Value | (?s)(^.*$) | The Search Value to search for in the FlowFile content. Only used for 'Literal Replace' and 'Regex Replace' matching strategies Supports Expression Language: true | |
Replacement Value | $1 | The value to insert using the 'Replacement Strategy'. Using "Regex Replace" back-references to Regular Expression capturing groups are supported, but back-references that reference capturing groups that do not exist in the regular expression will be treated as literal value. Back References may also be referenced using the Expression Language, as '$
1', '$2', etc. The single-tick marks MUST be included, as these variables are not "Standard" attribute names (attribute names must be quoted unless they contain only numbers, letters, and _). Supports Expression Language: true | |
Character Set | UTF-8 | The Character Set in which the file is encoded | |
Maximum Buffer Size | 1 MB | Specifies the maximum amount of data to buffer (per file or per line, depending on the Evaluation Mode) in order to apply the replacement. If 'Entire Text' (in Evaluation Mode) is selected and the FlowFile is larger than this value, the FlowFile will be routed to 'failure'. In 'Line-by-Line' Mode, if a single line is larger than this value, the FlowFile will be routed to 'failure'. A default value of 1 MB is provided, primarily for 'Entire Text' mode. In 'Line-by-Line' Mode, a value such as 8 KB or 16 KB is suggested. This value is ignored if the <Replacement Strategy> property is set to one of: Append, Prepend, Always Replace | |
Replacement Strategy | Regex Replace |
| The strategy for how and what to replace within the FlowFile's text content. |
Evaluation Mode | Entire text |
| Run the 'Replacement Strategy' against each line separately (Line-by-Line) or buffer the entire file into memory (Entire Text) and run against that. |
Name | Description |
---|---|
success | FlowFiles that have been successfully processed are routed to this relationship. This includes both FlowFiles that had text replaced and those that did not. |
failure | FlowFiles that could not be updated are routed to this relationship |
Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.
Text, Regular Expression, Update, Change, Replace, Modify, Regex, Mapping
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a proper ty supports the NiFi Expression Language.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Regular Expression | \S+ | The Regular Expression to search for in the FlowFile content Supports Expression Language: true | |
Matching Group | 0 | The number of the matching group of the provided regex to replace with the corresponding value from the mapping file (if it exists). Supports Expression Language: true | |
Mapping File | The name of the file (including the full pa th) containing the Mappings. | ||
Mapping File Refresh Interval | 60s | The polling interval in seconds to check for updates to the mapping file. The default is 60s. | |
Character Set | UTF-8 | The Character Set in which the file is encoded | |
Maximum Buffer Size | 1 MB | Specifies the maximum amount of data to buffer (per file) in order to apply the regular expressions. If a FlowFile is larger than this value, the FlowFile will be routed to 'failure' |
Name | Description |
---|---|
success | FlowFiles that have been successfully updated are r outed to this relationship, as well as FlowFiles whose content does not match the given Regular Expression |
failure | FlowFiles that could not be updated are routed to this relationship |
+ This processor routes FlowFiles based on their attributes + using the NiFi Expression Language. Users add properties with + valid NiFi Expression Language Expressions as the values. Each Expression must + return a value of type Boolean (true or false). +
++ Example: The goal is to route all files with filenames that start with ABC down a certain path. + Add a property with the following name and value: +
++ In this example, all files with filenames that start with ABC will follow the ABC relationship. +
+ + Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.RouteOnAttribute/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.RouteOnAttribute/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.RouteOnAttribute/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.RouteOnAttribute/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +Routes FlowFiles based on their Attributes using the Attribute Expression Language
attributes, routing, Attribute Expression Language, regexp, regex, Regular Expression, Expression Language
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value th> | Allowable Values | Description |
---|---|---|---|
Routing Strategy | Route to Property name |
| Specifies how to determine which relationship to use when evaluating the Expression Language |
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description |
---|---|---|
Relationship Name | Attribute Expression Language | Routes FlowFiles whose attributes match the Attribute Expression Language specified in the Dynamic Property Value to the Relationship specified in the Dynamic Property Key Supports Expression Language: true |
Name | Description |
---|---|
unmatched | FlowFiles that do not match any user-define expression will be routed here |
A Dynamic Relationship may be created based on how the user configures the Processor.
Name | Description |
---|---|
Name from Dynamic Property | FlowFiles that match the Dynamic Property's Attribute Expression Language |
Name | Description |
---|---|
RouteOnAttribute.Route | The relation to which the FlowFile was routed |
Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches. Regular Expressions are added as User-Defined Properties where the name of the property is the name of the relationship and the value is a Regular Expression to match against the FlowFile content. User-Defined properties do support the Attribute Expression Language, but the results are interpreted as literal values, not Regular Expressions
route, content, regex, regular expression, regexp p>
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Match Requirement | content must match exactly |
| Specifies whether the entire content of the file must match the regular expression exactly, or if any part of the file (up to Content Buffer Size) can contain the regular expression in order to be considered a match |
Character Set | UTF-8 | The Character Set in which the file is encoded | |
1 MB | Specifies the maximum amount of data to buffer in order to apply the regular expressions. If the size of the FlowFile exceeds this value, any amount of this value will be ignored |
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description |
---|---|---|
Relationship Name | A Regular Expression | Routes FlowFiles whose content matches the regular expression defined by Dynamic Property's value to the Relationship defined by the Dynamic Property's key Supports Expression Language: true |
Name | Description |
---|---|
unmatched | FlowFiles that do not matc h any of the user-supplied regular expressions will be routed to this relationship |
A Dynamic Relationship may be created based on how the user configures the Processor.
Name | Description |
---|---|
Name from Dynamic Property | FlowFiles that match the Dynamic Property's Regular Expression |
Routes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the 'Matching Strategy'. The data is then routed according to these rules, routing each line of the text individually.
attributes, routing, text, regexp, regex, Regular Expression, Expression Language, csv, filter, logs, delimited
In the list below, the names of requir ed properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Routing Strategy | Route to each matching Property Name |
| Specifies how to determine which Relationship(s) to use when evaluating the lines of incoming text against the 'Matching Strategy' and user-defined properties. |
Matching Strategy |
| Specifies how to evaluate each line of incoming text against the user-defined properties. | |
Character Set | UTF-8 | The Character Set in which the incoming text is encoded | |
Ignore Leading/Trailing Whitespace | true | Indicates whether or not the whitespace at the beginning and end of the lines should be ignored when evaluating the line. | |
Ignore Case | false |
| If true, capitalization will not be taken into account when comparing values. E.g., matching against 'HELLO' or 'hello' will have the same result. This property is ignored if the 'Matching Strategy' is set to 'Satisfies Expression'. |
Grouping Regular Expression | Specifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the Group from all Capturing Groups. Two lines will not be placed into the same FlowFile unless the they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CS V File by the first column, we can set this value to "(.*?),.*". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile. |
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description |
---|---|---|
Relationship Name | value to match against | Routes data that matches the value specified in the Dynamic Property Value to the Relationship specified in the Dynamic Property Key. |
Name | Description |
---|---|
original | The original input file will be routed to this destination when the lines have been successfully routed to 1 or more relationships |
unmatched | Data that does not satisfy the required user-defined rules will be routed to this Relationship |
A Dynamic Relationship may be created based on how the user configures the Processor.
Name | Description |
---|---|
Name from Dynamic Property | FlowFiles that match the Dynamic Property's value |
Scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms
scan, attributes, search, lookup
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Dictionary F ile | A new-line-delimited text file that includes the terms that should trigger a match. Empty lines are ignored. The contents of the text file are loaded into memory when the processor is scheduled and reloaded when the contents are modified. | ||
Attribute Pattern | .* | Regular Expression that specifies the names of attributes whose values will be matched against the terms in the dictionary | |
Match Criteria | At Least 1 Must Match |
| If set to All Must Match, then FlowFiles will be routed to 'matched' only if all specified attributes' values are found in the dictionary. If set to At Least 1 Mus t Match, FlowFiles will be routed to 'matched' if any attribute specified is found in the dictionary |
Dictionary Filter Pattern | A Regular Expression that will be applied to each line in the dictionary file. If the regular expression does not match the line, the line will not be included in the list of terms to search for. If a Matching Group is specified, only the portion of the term that matches that Matching Group will be used instead of the entire term. If not specified, all terms in the dictionary will be used and each term will consist of the text of the entire line in the file |
Name | Description |
---|---|
unmatched | FlowFiles whose attributes are not found in the dictionary will be routed to this relationship |
matched | FlowFiles whose attributes are fou nd in the dictionary will be routed to this relationship |
Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the 'matching.term' attribute
aho-corasick, scan, content, byte sequence, search, find, dictionary
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Allowable Values | Description | |
---|---|---|---|
Dictionary File | The filename of the terms dictionary | ||
Dictionary Encoding | text |
| Indicates how the dictionary is encoded. If 'text', dictionary terms are new-line delimited and UTF-8 encoded; if 'binary', dictionary terms are denoted by a 4-byte integer indicating the term length followed by the term itself |
Name | Description |
---|---|
unmatched | FlowFiles that do not match any term in the dictionary are routed to this relationship |
matched | FlowFiles that match at least one term in the dictionary are routed to this relatio nship |
Name | Description |
---|---|
matching.term | The term that caused the Processor to route the FlowFile to the 'matched' relationship; if FlowFile is routed to the 'unmatched' relationship, this attribute is not added |
Segments a FlowFile into multiple smaller segments on byte boundaries. Each segment is given the following attributes: fragment.identifier, fragment.index, fragment.count, segment.original.filename; these attributes can then be used by the MergeContent processor in order to reconstitute the original FlowFile
segment, split
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default valu es.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Segment Size | The maximum data size in bytes for each segment |
Name | Description |
---|---|
segments | All segments will be sent to this relationship. If the file was small enough that it was not segmented, a copy of the original is sent to this relationship as well as original |
original | The original FlowFile will be sent to this relationship |
Name | Description |
---|---|
segment.identifier | All segments produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute. This attribute is added to maintain backward compatibility, but the fragment.identifier is preferred, as it is designed to work in conjunction with the MergeContent Processor |
segment.index | A one-up number that indicates the ordering of the segments that were created from a single parent FlowFile. This attribute is added to maintain backward compatibility, but the fragment.index is preferred, as it is designed to work in conjunction with the MergeContent Processor |
segment.count | The number of segments generated from the parent FlowFile. This attribute is added to maintain backward compatibility, but the fragment.count is preferred, as it is designed to work in conjunction with the MergeContent Processor |
fragment.identifier | All segments produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute |
fragment.index | A one-up numbe r that indicates the ordering of the segments that were created from a single parent FlowFile |
fragment.count | The number of segments generated from the parent FlowFile |
segment.original.filename | The filename of the parent FlowFile |
segment.original.filename | The filename will be updated to include the parent's filename, the segment index, and the segment count |
Splits incoming FlowFiles by a specified byte sequence
content, split, binary
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
Byte Sequence Format | Hexadecimal |
| Specifies how the <Byte Sequence> property should be interpreted |
Byte Sequence | A representation of bytes to look for and upon which to split the source file into separate files | ||
Keep Byte Sequence | false |
| Determines whet her or not the Byte Sequence should be included with each Split |
Byte Sequence Location | Trailing |
| If <Keep Byte Sequence> is set to true, specifies whether the byte sequence should be added to the end of the first split or the beginning of the second; if <Keep Byte Sequence> is false, this prope rty is ignored. |
Name | Description |
---|---|
splits | All Splits will be routed to the splits relationship |
original | The original file |
Name | Description |
---|---|
fragment.identifier | All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute |
fragment.index | A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile |
fragment.count | The number of split FlowFiles generated from the parent FlowFile |
segment.original.filename | The filename of the parent FlowFile |
Splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression. Each generated FlowFile is comprised of an element of the specified array and transferred to relationship 'split,' with the original file transferred to the 'original' relationship. If the specified JsonPath is not found or does not evaluate to an array element, the original file is routed to 'failure' and no files are generated.
json, split, jsonpath
In the list below, the names of required properties appea r in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.
Name | Default Value | Allowable Values | Description |
---|---|---|---|
JsonPath Expression | A JsonPath expression that indicates the array element to split into JSON/scalar fragments. | ||
Null Value Representation | empty string |
| Indicates the desired representation of JSON Path expressions resulting in a null value. |
Name | Description |
---|---|
failure | If a FlowFile fails processing for any reason (for example, the FlowFile i s not valid JSON or the specified path does not exist), it will be routed to this relationship |
original | The original FlowFile that was split into segments. If the FlowFile fails processing, nothing will be sent to this relationship |
split | All segments of the original FlowFile will be routed to this relationship |
Name | Description |
---|---|
fragment.identifier | All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute |
fragment.index | A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile |
fragment.count | The number of split FlowFiles generated from the parent FlowFile |
segment.original.filename | T he filename of the parent FlowFile |
Splits up an input FlowFile that is in a record-oriented data format into multiple smaller FlowFiles
split, generic, schema, json, csv, avro, log, logs, freeform, text
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Default Value | Allowable Values | Description | |
---|---|---|---|
Record Reader | Controller Service API: RecordReaderFactory Implementations: CSVReader GrokReader AvroReader JsonTreeReader JsonPathReader ScriptedReader | Specifies the Controller Service to use for reading incoming data | |
Record Writer | Controller Service API: RecordSetWriterFactory Implementations: JsonRecordSetWriter FreeFormTextRecordSetWriter AvroRecordSetWriter ScriptedRecordSetWriter CSVRecordSetWriter | Specifies the Controller Service to use for writing out the records | |
Records Per Split | Specifies how many records should be written to each 'split' or 'segment' FlowFile Supports Expression Language: true |
Name | Description |
---|---|
failure | If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship. |
splits | The individual 'segments' of the original FlowFile will be routed to this relationship. |
original | Upon successfully splitting an input FlowFile, the original FlowFile will be sent to this relationship. |
Name | Description |
---|---|
mime.type | Sets the mime.type attribute to the MIME Type specified by the Record Writer for the FlowFiles routed to the 'splits' Relationship. |
record.count | The number of records in the FlowFile. This is added to FlowFiles that are routed to the 'splits' Relationship. |