nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [nifi] alopresto commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository
Date Thu, 09 Jan 2020 20:56:46 GMT
alopresto commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile
repository
URL: https://github.com/apache/nifi/pull/3968#discussion_r364955954
 
 

 ##########
 File path: nifi-docs/src/main/asciidoc/user-guide.adoc
 ##########
 @@ -2773,6 +2773,86 @@ When switching between implementation "families" (i.e. `VolatileContentRepositor
 * Multiple repositories -- No additional effort or testing has been applied to multiple repositories
at this time. It is possible/likely issues will occur with repositories on different physical
devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one
plaintext repository).
 * Corruption -- when a disk is filled or corrupted, there have been reported issues with
the repository becoming corrupted and recovery steps are necessary. This is likely to continue
to be an issue with the encrypted repository, although still limited in scope to individual
claims (i.e. an entire repository file won't be irrecoverable due to the encryption). Some
testing has been performed on scenarios where disk space is exhausted. While the flow can
no longer write additional content claims to the repository in that case, the NiFi application
continues to function properly, and successfully written content claims are still available
via the Provenance Query operations. Stopping NiFi and removing the content repository (or
moving it to a larger disk) resolves the issue.
 
+[[encrypted-flowfile]]
+== Encrypted FlowFile Repository
+While OS-level access control can offer some security over the flowfile attribute and content
claim data written to the disk in a repository, there are scenarios where the data may be
sensitive, compliance and regulatory requirements exist, or NiFi is running on hardware not
under the direct control of the organization (cloud, etc.). In this case, the flowfile repository
allows for all data to be encrypted before being persisted to the disk. For more information
on the internal workings of the flowfile repository, see <<nifi-in-depth.adoc#flowfile-repository,NiFi
In-Depth - FlowFile Repository>>.
+
+[WARNING]
+.Experimental
+============
+This implementation is marked <<experimental_warning, *experimental*>> as of
Apache NiFi 1.11.0 (January 2020). The API, configuration, and internal behavior may change
without warning, and such changes may occur during a minor release. Use at your own risk.
+============
+
+[WARNING]
+.Performance
+============
+The current implementation of the encrypted flowfile repository intercepts the serialization
of flowfile record data via the `EncryptedSchemaRepositoryRecordSerde` and uses the `AES/GCM`
algorithm, which is fairly performant on commodity hardware. This use of an authenticated
encryption algorithm (AEAD) block cipher (because the content length is limited and known
a priori) is the same as the <<encrypted-provenance,Encrypted Provenance Repository>>,
but differs from the unauthenticated stream cipher used in the <<encrypted-content,Encrypted
Content Repository>>. In low volume flowfile scenarios, the added cost will be minimal.
However, administrators should perform their own risk assessment and performance analysis
and decide how to move forward. Switching back and forth between encrypted/unencrypted implementations
is not recommended at this time.
+============
+
+=== What is it?
+
+The `EncryptedSequentialAccessWriteAheadLog` is a new implementation of the flowfile write-ahead
log which encrypts all flowfile attribute data before it is written to the repository. This
allows for storage on systems where OS-level access controls are not sufficient to protect
the data while still allowing querying and access to the data through the NiFi UI/API.
+
+=== How does it work?
+
+The `SequentialAccessWriteAheadLog` was introduced in NiFi 1.6.0 and provided a faster flowfile
repository implementation. The encrypted version wraps that implementation with functionality
to transparently encrypt and decrypt the serialized `RepositoryRecord` objects during file
system interaction. During all writes to disk (swapping, snapshotting, journaling, and checkpointing),
the flowfile containers are serialized to bytes based on a schema, and this serialized form
is encrypted before writing. This allows the snapshot handler to continue interacting with
the flowfile repository interface in the same way as before and continue operating on flowfile
data in a random access manner, without requiring any changes to handle the data protection.
+
+The fully qualified class `org.apache.nifi.wali.EncryptedSequentialAccessWriteAheadLog` is
specified as the flowfile repository write-ahead log implementation in _nifi.properties_ as
the value of `nifi.flowfile.repository.wal.implementation`. In addition, <<administration-guide.adoc#encrypted-write-ahead-flowfile-repository-properties,new
properties>> must be populated to allow successful initialization.
+
+==== StaticKeyProvider
+The `StaticKeyProvider` implementation defines keys directly in _nifi.properties_. Individual
keys are provided in hexadecimal encoding. The keys can also be encrypted like any other sensitive
property in _nifi.properties_ using the <<administration-guide.adoc#encrypt-config_tool,`./encrypt-config.sh`>>
tool in the NiFi Toolkit.
+
+The following configuration section would result in a key provider with two available keys,
"Key1" (active) and "AnotherKey".
+....
+nifi.flowfile.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider
+nifi.flowfile.repository.encryption.key.id=Key1
+nifi.flowfile.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
+nifi.flowfile.repository.encryption.key.id.AnotherKey=0101010101010101010101010101010101010101010101010101010101010101
+....
+
+==== FileBasedKeyProvider
+The `FileBasedKeyProvider` implementation reads from an encrypted definition file of the
format:
+
+....
+key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
+key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
+key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
+key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
+key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==
+....
+
+Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped
AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys
are wrapped by AES/GCM encryption using the **master key** defined by `nifi.bootstrap.sensitive.key`
in _conf/bootstrap.conf_.
+
+==== Key Rotation
+Simply update _nifi.properties_ to reference a new key ID in `nifi.flowfile.repository.encryption.key.id`.
Previously-encrypted flowfile records can still be decrypted as long as that key is still
available in the key definition file or `nifi.flowfile.repository.encryption.key.id.<OldKeyID>`
as the key ID is serialized alongside the encrypted record.
+
+=== Writing and Reading FlowFiles
+Once the repository is initialized, all flowfile record write operations are serialized using
`RepositoryObjectBlockEncryptor` (the only currently existing implementation is `RepositoryObjectAESGCMEncryptor`)
to the provided `DataOutputStream`. The original stream is swapped with a temporary wrapped
stream, which encrypts the data written by the wrapped serializer/deserializer via `EncryptedSchemaRepositoryRecordSerde`
inline and the encryption metadata (`keyId`, `algorithm`, `version`, `IV`, `cipherByteLength`)
is serialized and prepended. The complete length and encrypted bytes are then written to the
original `DataOutputStream` on disk as normal.
+
+image:encrypted-flowfile-hex.png["Encrypted flowfile repository journal file on disk"]
+
+On flowfile record read, the process is reversed. The encryption metadata (`RepositoryObjectEncryptionMetadata`)
is parsed and used to decrypt the serialized bytes, which are then deserialized into a `DataInputStream`
object.
+
+During swaps and recoveries, the flowfile records are deserialized and reserialized, so if
the active key has been changed, the flowfile records will be re-encrypted with the new active
key.
+
+Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted
flowfile repository. All framework interactions with flowfiles work as expected with no change
to the process.
+
+=== Potential Issues
+
+[WARNING]
+.Switching Implementations
+============
+It is not recommended to switch between any implementation other than `SequentialAccessWriteAheadLog`
and the `EncryptedSequentialAccessWriteAheadLog`. To migrate from a different provider, first
migrate to the plaintext sequential log, allow NiFi to automatically recover the flowfiles,
then stop NiFi and change the configuration to enable encryption. NiFi will automatically
recover the plaintext flowfiles from the repository, and begin encrypting them on subsequent
writes.
+============
+
+* Switching between unencrypted and encrypted repositories
+** If a user has an existing write-ahead repository (`WriteAheadFlowFileRepository`) that
is not encrypted (uses the `SequentialAccessWriteAheadLog`) and switches their configuration
to use an encrypted repository, the application handles this and all flowfile records will
be recovered on startup. Future writes (including re-serialization of these same flowfiles)
will be encrypted. If a user switches from an encrypted repository to an unencrypted repository,
the flowfiles cannot be recovered, and it is recommended to delete the existing flowfile repository
before switching in this direction. Automatic roll-over is a future effort (link:https://issues.apache.org/jira/browse/NIFI-6994[NIFI-6994^])
but NiFi is not intended for long-term storage of flowfile records so the impact should be
minimal. There are two scenarios for roll-over:
+*** Encrypted -> unencrypted -- if the previous repository implementation was encrypted,
these records should be handled seamlessly as long as the key provider available still has
the keys used to encrypt the claims (see **Key Rotation**)
 
 Review comment:
   Done. Good call. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message