Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-gcp-nar/1.4.0/org.apache.nifi.processors.gcp.storage.PutGCSObject/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-gcp-nar/1.4.0/org.apache.nifi.processors.gcp.storage.PutGCSObject/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-gcp-nar/1.4.0/org.apache.nifi.processors.gcp.storage.PutGCSObject/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-gcp-nar/1.4.0/org.apache.nifi.processors.gcp.storage.PutGCSObject/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +PutGCSObject

PutGCSObject

Description:

Puts flow files to a Google Cloud Bucket.

Tags:

google, google cloud, gcs, archive, put

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the NiFi Expression Language, and whether a property is considered "sensitive", meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.

NameDefault ValueAllowable ValuesDescription
GCP Credentials Provider ServiceController Service API:
GCPCredentialsService
Implementation: GCPCredentialsControllerService
The Controller Service used to obtain Google Cloud Platform credentials.
Project IDGoogle Cloud Project ID
Number of retries6How many retry attempts should be made before routing to the failure relationship.
Bucket${gcs.bucket}Bucket of the object.
Supports Expression Language: true
Key${filename}Name of the object.
Supports Expression Language: true
Content Type${mime.type}Content Type for the file, i.e. text/plain
Supports Expression Language: true
MD5 HashMD5 Hash (encoded in Base64) of the file for server-side validatio n.
Supports Expression Language: true
CRC32C ChecksumCRC32C Checksum (encoded in Base64, big-Endian order) of the file for server-side validation.
Supports Expression Language: true
Object ACL
  • All Authenticated Users Gives the bucket or object owner OWNER permission, and gives all authenticated Google account holders READER and WRITER permissions. All other permissions are removed.
  • Authenticated Read Gives the bucket or object owner OWNER per
 mission, and gives all authenticated Google account holders READER permission. All other permissions are removed.
  • Bucket Owner Full Control Gives the object and bucket owners OWNER permission. All other permissions are removed.
  • Bucket Owner Read Only Gives the object owner OWNER permission, and gives the bucket owner READER permission. All other permissions are removed.
  • Private Gives the 
 bucket or object owner OWNER permission for a bucket or object, and removes all other access permissions.
  • Project Private Gives permission to the project team based on their roles. Anyone who is part of the team has READER permission. Project owners and project editors have OWNER permission. This is the default ACL for newly created buckets. This is also the default ACL for newly created objects unless the default object ACL for that bucket has been changed.
  • Public Read Only Gives the bucket or object owner OWNER permission, and gives all users, both authenticated and anonymous, READER permission. When you apply this to an object, anyone on the Internet can read the object without authenticating.
Access Control to be attached to the object uploaded. Not providing this will revert to bucket defaults.
Server Side Encryption KeyAn AES256 Encryption Key (encoded in base64) for server-side encryption of the object.
Sensitive Property: true
Suppor ts Expression Language: true
Overwrite Objecttrue
  • true
  • false
If false, the upload to GCS will succeed only if the object does not exist.
Content Disposition Type
  • Inline Indicates that the object should be loaded and rendered within the browser.
  • Attachment Indicates that the object should be saved (using a Save As... dialog) rather than opened directly within the browser
Type of RFC-6266 Content Disposition to be attached to the object

Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.
NameValueDescription
The name of a User-Defined Metadata field to add to the GCS ObjectThe value of a User-Defined Metadata field to add to the GCS ObjectAllows user-defined metadata to be added to the GCS object as key/value pairs
Supports Expression Language: true

Relationships:

NameDescription
successFlowFiles are routed to this relationship after a successful Google Cloud Storage operation.
failureFlowFiles are routed to this relationship if the Google Cloud Storage operation fails.

Reads Attributes :

NameDescription
filenameUses the FlowFile's filename as the filename for the GCS object
mime.typeUses the FlowFile's MIME type as the content-type for the GCS object

Writes Attributes:

NameDescription
gcs.bucketBucket of the object.
gcs.keyName of the object.
gcs.sizeSize of the object.
gcs.cache.controlData cache control of the object.
gcs.component.countThe number of components which make up the object.
gcs.content.dispositionThe data content disposition of the object.
gcs.content.encodingThe content encoding of the object.
gcs.content.languageThe content language of the object.
mime.t ypeThe MIME/Content-Type of the object
gcs.crc32cThe CRC32C checksum of object's data, encoded in base64 in big-endian order.
gcs.create.timeThe creation time of the object (milliseconds)
gcs.update.timeThe last modification time of the object (milliseconds)
gcs.encryption.algorithmThe algorithm used to encrypt the object.
gcs.encryption.sha256The SHA256 hash of the key used to encrypt the object
gcs.etagThe HTTP 1.1 Entity tag for the object.
gcs.generated.idThe service-generated for the object
gcs.generationThe data generation of the object.
gcs.md5The MD5 hash of the object's data encoded in base64.
gcs.media.linkThe media download link to the object.
gcs.metagenerationThe metageneration of the object.
gcs.ownerThe owner (uploader) of the object.
gcs.owner.typeThe ACL entity type of the uploader of the object.
gcs.uriThe URI of the object as a string.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

See Also:

FetchGCSObject, DeleteGCSObject, ListGCSBucket

\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.InvokeGRPC/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.InvokeGRPC/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.InvokeGRPC/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.InvokeGRPC/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +InvokeGRPC

InvokeGRPC

Description:

Sends FlowFiles, optionally with content, to a configurable remote gRPC service endpoint. The remote gRPC service must abide by the service IDL defined in NiFi. gRPC isn't intended to carry large payloads, so this processor should be used only when FlowFile sizes are on the order of megabytes. The default maximum message size is 4MB.

Tags:

grpc, rpc, client

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also ind icates any default values.

NameDefault ValueAllowable ValuesDescription
Remote gRPC service hostnameRemote host which will be connected to
Remote gRPC service portRemote port which will be connected to
Max Message Size4MBThe maximum size of FlowFiles that this processor will allow to be received. The default is 4MB. If FlowFiles exceed this size, you should consider using another transport mechanism as gRPC isn't designed for heavy payloads.
Use SSL/TLSfalse
  • true
  • < li>false
Whether or not to use SSL/TLS to send the contents of the gRPC messages.
SSL Context ServiceController Service API:
SSLContextService
Implementations: StandardSSLContextService
StandardRestrictedSSLContextService
The SSL Context Service used to provide client certificate information for TLS/SSL (https) connections.
Send FlowFile Contenttrue
  • true
  • false
Whether or not to include the FlowFile content in the FlowFileReq uest to the gRPC service.
Always Output Responsefalse
  • true
  • false
Will force a response FlowFile to be generated and routed to the 'Response' relationship regardless of what the server status code received is or if the processor is configured to put the server response body in the request attribute. In the later configuration a request FlowFile with the response body in the attribute and a typical response FlowFile will be emitted to their respective relationships.
Penalize on "No Retry"false
  • true
  • false
Enabling this property will penalize FlowFiles that are routed to the "No Retry" relationship.

Relationships:

NameDescription
Origi nalThe original FlowFile will be routed upon success. It will have new attributes detailing the success of the request.
FailureThe original FlowFile will be routed on any type of connection failure, timeout or general exception. It will have new attributes detailing the request.
RetryThe original FlowFile will be routed on any status code that can be retried. It will have new attributes detailing the request.
No RetryThe original FlowFile will be routed on any status code that should NOT be retried. It will have new attributes detailing the request.
ResponseA Response FlowFile will be routed upon success. If the 'Output Response Regardless' property is true then the response will be sent to this relationship regardless of the status code received.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
invokegrpc.response.codeThe response code that is returned (0 = ERROR, 1 = SUCCESS, 2 = RETRY)
invokegrpc.response.bodyThe response message that is returned
invokegrpc.service.hostThe remote gRPC service hostname
invokegrpc.service.portThe remote gRPC service port
invokegrpc.java.exception.classThe Java exception class raised when the processor fails
invokegrpc.java.exception.messageThe Java exception message raised when the processor fails

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship. \ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.ListenGRPC/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.ListenGRPC/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.ListenGRPC/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-grpc-nar/1.4.0/org.apache.nifi.processors.grpc.ListenGRPC/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +ListenGRPC

ListenGRPC

Description:

Starts a gRPC server and listens on the given port to transform the incoming messages into FlowFiles. The message format is defined by the standard gRPC protobuf IDL provided by NiFi. gRPC isn't intended to carry large payloads, so this processor should be used only when FlowFile sizes are on the order of megabytes. The default maximum message size is 4MB.

Tags:

ingest, grpc, rpc, listen

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

NameDefault ValueAllowable ValuesDescription
Local gRPC Service PortThe local port that the gRPC service will listen on.
Use TLSfalse
  • true
  • false
Whether or not to use TLS to send the contents of the gRPC messages.
SSL Context ServiceController Service API:
RestrictedSSLContextService
Implementation: StandardRestrictedSSLContextService
The SSL Context Service used to provide client certificate information for TLS (https) connections.
Flow Control Window1MBThe initial HTTP/2 flow control window for both new streams and overall connection. Flow-control schemes ensure that streams on the same connection do not destructively interfere with each other. The default is 1MB.
Authorized DN Pattern.*A Regular Expression to apply against the Distinguished Name of incoming connections. If the Pattern does not match the DN, the connection will be refused.
Maximum Message Size4MBThe maximum size of FlowFiles that this processor will allow to be received. The default is 4MB. If Flo wFiles exceed this size, you should consider using another transport mechanism as gRPC isn't designed for heavy payloads.

Relationships:

NameDescription
SuccessThe FlowFile was received successfully.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
listengrpc.remote.user.dnThe DN of the user who sent the FlowFile to this NiFi
listengrpc.remote.hostThe IP of the client who sent the FlowFile to this NiFi

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component does not allow an incoming relationship. \ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1,46 @@ + + + + + + CreateHadoopSequenceFile + + + + + + +

Description:

+

This processor is used to create a Hadoop Sequence File, which essentially is a file of key/value pairs. The key + will be a file name and the value will be the flow file content. The processor will take either a merged (a.k.a. packaged) flow + file or a singular flow file. Historically, this processor handled the merging by type and size or time prior to creating a + SequenceFile output; it no longer does this. If creating a SequenceFile that contains multiple files of the same type is desired, + precede this processor with a RouteOnAttribute processor to segregate files of the same type and follow that with a + MergeContent processor to bundle up files. If the type of files is not important, just use the + MergeContent processor. When using the MergeContent processor, the following Merge Formats are + supported by this processor: +

+ The created SequenceFile is named the same as the incoming FlowFile with the suffix '.sf'. For incoming FlowFiles that are + bundled, the keys in the SequenceFile are the individual file names, the values are the contents of each file. +

+ NOTE: The value portion of a key/value pair is loaded into memory. While there is a max size limit of 2GB, this could cause memory + issues if there are too many concurrent tasks and the flow file sizes are large. + + Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +CreateHadoopSequenceFile

CreateHadoopSequenceFile

Description:

Creates Hadoop Sequence Files from incoming flow files

Additional Details...

Tags:

hadoop, sequence file, create, sequencefile

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Supports Expression Language: true
Kerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos KeytabKerberos keytab associated with the principal. Require s nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Supports Expression Language: true
Additional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.
Compression type
  • NONE
  • RECORD
  • BLOCK
Type of compression to use when creating Sequence Fi le
Compression codecNONE
  • NONE No compression
  • DEFAULT Default ZLIB compression
  • BZIP BZIP compression
  • GZIP GZIP compression
  • LZ4 LZ4 compression
  • LZO LZO compression - it assumes LD_LIBRARY_PATH has been set and jar is available
  • SNAPPY Snappy compression
  • AUTOMATIC Will attempt to automatically detect the compression codec.
No Description Provided.

Relationships:

NameDescription
successGenerated Sequence Files are sent to this relationship
failureIncoming files that failed to generate a Sequence File are sent to this relationship

Reads Attributes:

None specified.

Writes Attributes:

None specified.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

See Also:

PutHDFS

\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.DeleteHDFS/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.DeleteHDFS/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.DeleteHDFS/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.DeleteHDFS/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +DeleteHDFS

DeleteHDFS

Description:

Deletes one or more files or directories from HDFS. The path can be provided as an attribute from an incoming FlowFile, or a statically set path that is periodically removed. If this processor has an incoming connection, itwill ignore running on a periodic basis and instead rely on incoming FlowFiles to trigger a delete. Note that you may use a wildcard character to match multiple files or directories. If there are no incoming connections no flowfiles will be transfered to any output relationships. If there is an incoming flowfile then provided there are no de tected failures it will be transferred to success otherwise it will be sent to false. If knowledge of globbed files deleted is necessary use ListHDFS first to produce a specific list of files to delete.

Tags:

hadoop, HDFS, delete, remove, filesystem, restricted

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search t he classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Supports Expression Language: true
Kerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos KeytabKerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Supports Expression Langu age: true
Additional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.
PathThe HDFS file or directory to delete. A wildcard expression may be used to only delete certain files
Supports Expression Language: true
Recursivetrue
  • true
  • false
Remove contents of a non-empty directory recursively

Relationships:

NameDescription
successWhen an incoming flowfile is used then if there are no errors invoking delete the flowfile will route here.
failureWhen an incoming flowfile is used and there is a failure while deleting then the flowfile will route here.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
hdfs.filenameHDFS file to be deleted
hdfs.pathHDFS Path specified in the delete request
hdfs.error.messageHDFS error message related to the hdfs.error.code

State management:

This component does not store state.

Restricted:

Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component allows an incoming relationship.

See Also:

ListHDFS

\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.FetchHDFS/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.FetchHDFS/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.FetchHDFS/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.FetchHDFS/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +FetchHDFS

FetchHDFS

Description:

Retrieves a file from HDFS. The content of the incoming FlowFile is replaced by the content of the file in HDFS. The file in HDFS is left intact without any changes being made to it.

Tags:

hadoop, hdfs, get, ingest, fetch, source, restricted

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Supports Expression Language: true
Kerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos KeytabKerberos key tab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Supports Expression Language: true
Additional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.
HDFS Filename${path}/${filename}The name of the HDFS file to retrieve
Supports Expression Language: true
Compression codecNONE
  • NONE No compression
  • DEFAULT Default ZLIB compression
  • BZIP BZIP compression
  • GZIP GZIP compression
  • LZ4 LZ4 compression
  • LZO LZO compression - it assumes LD_LIBRARY_PATH has been set and jar is available
  • SNAPPY Snappy compression
  • AUTOMATIC Will attempt to automatically detect the compression codec.
No Description Provided.

Relationships:

NameDescription
successFlowFiles will be routed to this relationship once they have been updated with the content of the HDFS file
comms.failureFlowFiles will be routed to this relationship if the content of the HDFS file cannot be retrieve due to a communications failure. This generally indicates that the Fetch should be tried again.
failureFlowFiles will be routed to this relationship if the c ontent of the HDFS file cannot be retrieved and trying again will likely not be helpful. This would occur, for instance, if the file is not found or if there is a permissions issue

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
hdfs.failure.reasonWhen a FlowFile is routed to 'failure', this attribute is added indicating why the file could not be fetched from HDFS

State management:

This component does not store state.

Restricted:

Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component requires an incoming relationship.

See Also:

ListHDFS, GetHDFS, PutHDFS

\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFS/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFS/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFS/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFS/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +GetHDFS

GetHDFS

Description:

Fetch files from Hadoop Distributed File System (HDFS) into FlowFiles. This Processor will delete the file from HDFS after fetching it.

Tags:

hadoop, HDFS, get, fetch, ingest, source, filesystem, restricted

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescriptionHadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Supports Expression Language: trueKerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: trueKerberos KeytabKerberos keytab associated with the principal. Requ ires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: trueKerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Supports Expression Language: trueAdditional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.DirectoryThe HDFS directory from which files should be read
Supports Expression Languag e: trueRecurse SubdirectoriestrueIndicates whether to pull files from subdirectories of the HDFS directoryKeep Source FilefalseDetermines whether to delete the file from HDFS after it has been successfully transferred. If true, the file will be fetched repeatedly. This is intended for testing only.File Filter RegexA Java Regular Expression for filtering Filenames; if a filter is supplied then only files whose names match that Regular Expression will be fetched, otherwise all files will be fetchedFilter Match Name OnlytrueIf true then File Filter Regex will match on just the filename, otherwise subdirectory names will be included with filename in the regex comparisonIgnore Dotted FilestrueIf true, files whose names begin with a dot (".") will be ignoredMinimum File Age0 secThe minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignoredMaximum File AgeThe maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignoredPolling Interval0 secIndicates how long to wait between performing directory listingsBatch Size100The maximum number of files to pull in each iteration, based on run schedule.IO Buffer SizeAmount of memory to use to buffer file contents during IO. This overrides the Hadoop ConfigurationCompression codecNONENo Description Provided.

Relationships:

NameDescription
successAll files retrieved from HDFS are transferred to this relationship

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
filenameThe name of the file that was read from HDFS.
pathThe path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".

State management:

This component does not store state.

Restricted:

Provides operator the ability to retrieve and delete any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component does not allow an incoming relationship.

See Also:

PutHDFS, ListHDFS

\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFSSequenceFile/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFSSequenceFile/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFSSequenceFile/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.GetHDFSSequenceFile/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +GetHDFSSequenceFile

GetHDFSSequenceFile

Description:

Fetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles

Tags:

hadoop, HDFS, get, fetch, ingest, source, sequence file

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDe fault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Supports Expression Language: true
Kerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos KeytabKerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Supports Expression Language: true
Additional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.
DirectoryThe HDFS directory from which files should be read
Supports Expression Language: true
Recurse Subdirectoriestrue
  • true
  • false
Indicates whether to pull files from subdirectories of the HDFS directory
Keep Source Filefalse
  • true
  • false
Determines whether to delete the file from HDFS after it has been successfully transferred. If true, the file will be fetched repeatedly. This is intended for testing only.
File Filter RegexA Java Regular Expression for filtering Filenames; if a filter is supplied then only files whose names match that Regular Expression will be fetched, otherwise all files will be fetched
Filter Match Name Onlytrue
  • true
  • false
If true then File Filter Regex will match on just the filename, otherwise subdirectory names will be included with filename in the regex comparison
Ignore Dotted Filestrue
  • true
  • false
If true, files whose names begin with a dot (".") will be ignored
Minimum File Age0 secThe minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignored
Maximum File AgeThe maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignored
Polling Interval0 secIndicates how long to wait between performing directory listings
Batch Size100The maximum number of files to pull in each iteration, based on run schedule.
IO Buffer SizeAmount of memory to use to buffer file contents during IO. This overrides the Hadoop Configuration
Compression codecNONE
  • NONE No compression
  • DEFAULT Default ZLIB compression
  • BZIP BZIP compression
  • GZIP GZIP compression
  • LZ4 LZ4 compression
  • LZO LZO compression - it assumes LD_LIBRARY_PATH has been set and jar is available
  • SNAPPY Snappy compression
  • AUTOMATIC Will attempt to automatically detect the compression codec.
No Description Provided.
FlowFile ContentVALUE ONLY
  • VALUE ONLY
  • KEY VALUE PAIR
Indicate if the content is to be both the key and value of the Sequence File, or just the value.

Relationships:

NameDescription
successAll files retrieved from HDFS are transferred to this relationship

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
filenameThe name of the file that was read from HDFS.
pathThe path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".

State management:

This component does not store state.

Restricted:

Provides operator the ability to retrieve and delete any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component does not allow an incoming relationship.

See Also:

PutHDFS

\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.ListHDFS/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.ListHDFS/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.ListHDFS/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.ListHDFS/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +ListHDFS

ListHDFS

Description:

Retrieves a listing of files from HDFS. Each time a listing is performed, the files with the latest timestamp will be excluded and picked up during the next execution of the processor. This is done to ensure that we do not miss any files, or produce duplicates, in the cases where files with the same timestamp are written immediately before and after a single execution of the processor. For each file that is listed in HDFS, this processor creates a FlowFile that represents the HDFS file to be fetched in conjunction with FetchHDFS. This Processor is designed to run o n Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetHDFS, this Processor does not delete any data from HDFS.

Tags:

hadoop, HDFS, get, list, ingest, source, filesystem

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Witho ut this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Supports Expression Language: true
Kerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos KeytabKerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Supports Expression Language: true
Additional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.
Distributed Cache ServiceController Service API:
DistributedMapCacheClient
Implementations: HBase_1_1_2_ClientMapCacheService
RedisDistributedMapCacheClientService
DistributedMapCacheClientService
Specifies the Controller Service that should be used to maintain state about what has been pulled from HDFS so that if a new node begins pulling data, it won't duplicate all of the work that has been done.
DirectoryThe HDFS directory from which files should be read
Supports Expression Language: true
Recurse Subdirectoriestrue
  • true
  • false
Indicates whether to list files from subdirectories of the HDFS directory
File Filter[ ^\.].*Only files whose names match the given regular expression will be picked up
Minimum File AgeThe minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignored
Maximum File AgeThe maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignored. Minimum value is 100ms.

Relationships:

NameDescription
successAll FlowFiles are transferred to this relationship

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
filenameThe name of the file that was read from HDFS.
pathThe path is set to the absolute path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "/tmp/abc/1/2/3".
hdfs.ownerThe user that owns the file in HDFS
hdfs.groupThe group that owns the file in HDFS
hdfs.lastModifiedThe timestamp of when the file in HDFS was last modified, as milliseconds since midnight Jan 1, 1970 UTC
hdfs.lengthThe number of bytes in the file in HDFS
hdfs.replicationThe number of HDFS replicas for hte file
hdfs.p ermissionsThe permissions for the file in HDFS. This is formatted as 3 characters for the owner, 3 for the group, and 3 for other users. For example rw-rw-r--

State management:

ScopeDescription
CLUSTERAfter performing a listing of HDFS files, the latest timestamp of all the files listed and the latest timestamp of all the files transferred are both stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run, without having to store all of the actual filenames/paths which could lead to performance problems. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.

Restricted:

This component is not restricted.

Input requi rement:

This component does not allow an incoming relationship.

See Also:

GetHDFS, FetchHDFS, PutHDFS

\ No newline at end of file Added: nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.PutHDFS/index.html URL: http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.PutHDFS/index.html?rev=1811008&view=auto ============================================================================== --- nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.PutHDFS/index.html (added) +++ nifi/site/trunk/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.PutHDFS/index.html Tue Oct 3 13:30:16 2017 @@ -0,0 +1 @@ +PutHDFS

PutHDFS

Description:

Write FlowFile data to Hadoop Distributed File System (HDFS)

Tags:

hadoop, HDFS, put, copy, filesystem, restricted

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Conflict Resolution Strategy< /tr>
NameDefault ValueAllowable ValuesDescr iption
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Supports Expression Language: true
Kerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true
Kerberos KeytabKerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expressi on Language: true
Kerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Supports Expression Language: true
Additional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.
DirectoryThe parent HDFS directory to which files should be written. The directory will be created if it doesn't exist.
Supports Expression Language: true
fail
  • replace Replaces the existing file if any.
  • ignore Ignores the flow file and routes it to success.
  • fail Penalizes the flow file and routes it to failure.
  • append Appends to the existing file if any, creates a new file otherwise.
Indicates what should happen when a file with the same name already exists in the output direct ory
Block SizeSize of each block as written to HDFS. This overrides the Hadoop Configuration
IO Buffer SizeAmount of memory to use to buffer file contents during IO. This overrides the Hadoop Configuration
ReplicationNumber of times that HDFS will replicate each file. This overrides the Hadoop Configuration
Permissions umaskA umask represented as an octal number which determines the permissions of files written to HDFS. This overrides the Hadoop Configuration dfs.umaskmode
Remote OwnerChanges the owner of the HDFS file to this value after it is written. This only works if NiFi is running as a user that has HDFS super user privilege to change owner
Supports Expression Language: true
Remote GroupChanges the group of the HDFS file to this value after it is written. This only works if NiFi is running as a user that has HDFS super user privilege to change group
Supports Expression Language: true
Compression codecNONE
  • NONE No compression
  • DEFAULT Default ZLIB compression
  • BZIP BZIP compression
  • GZIP GZIP compression
  • LZ4 LZ4 compression
  • LZO LZO compression - it assumes LD_LIBRARY_PATH has been set and jar is available
  • SNAPPY Snappy compression
  • AUTOMATIC Will attempt to automatically detect the compression codec.
No Description Provided.

Relationships:

NameDescription
successFiles that have been successfully written to HDFS are transferred to this relationship
failureFiles that could not be written to HDFS for some reason are transferred to this relationship

Reads Attributes:

NameDescription
filenameThe name of the file written to HDFS comes from the value of this attribute.

Writes Attributes:

NameDescription
filenameThe name of the file written to HDFS is stored in this attribute.
absolute.hdfs.pathThe absolute path to the file on HDFS is stored in this attribute.

State management:

This component does not store state.

Restricted:

Provides operator t he ability to write to any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component requires an incoming relationship.

See Also:

GetHDFS

\ No newline at end of file