accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] milleruntime closed pull request #108: Add documentation for crypto
Date Mon, 01 Oct 2018 20:39:04 GMT
milleruntime closed pull request #108: Add documentation for crypto

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/_docs-2-0/administration/ b/_docs-2-0/administration/
new file mode 100644
index 00000000..74de3482
--- /dev/null
+++ b/_docs-2-0/administration/
@@ -0,0 +1,112 @@
+title: On Disk Encryption
+category: administration
+order: 14
+For an additional layer of security, Accumulo can encrypt files stored on disk.  On Disk
encryption was reworked 
+for 2.0, making it easier to configure and more secure.  The files that can be encrypted
include: [RFiles][design] and Write Ahead Logs (WALs).
+For information on encrypting data over the wire see the section on [SSL].  For information
on cryptographic client-server authentication see the section on [Kerberos].
+## Configuration
+To encrypt all tables on disk, encryption must be enabled before an Accumulo instance is
initialized.  If on disk 
+encryption is enabled on an existing cluster, only files created after it is enabled will
be encrypted 
+(root and metadata tables will not be encrypted in this case) and existing data won't be
encrypted until compaction.  To configure on disk encryption, add the 
+{% plink instance.crypto.service %} property to your `` file.  The value
of this property is the
+class name of the service which will perform crypto on RFiles and WALs. 
+Out of the box, Accumulo provides the `AESCryptoService` for basic encryption needs.  This
class provides AES encryption 
+with Galois/Counter Mode (GCM) for RFiles and Cipher Block Chaining (CBC) mode for WALs.
 The additional properties 
+below are required by this crypto service to be set using the {% plink instance.crypto.opts.*
%} prefix.
+The first property tells the crypto service how it will get the key encryption key.  The
second property tells the service 
+where to find the key.  For now, the only valid values are "uri" and the path to the key
file. The key file can be 16 or 32 bytes. 
+For example, openssl can be used to create a random 32 byte key:
+openssl rand -out /path/to/keyfile 32
+Initializing Accumulo after these instance properties are set, will enable on disk encryption
across your entire cluster.
+## Custom Crypto
+The new crypto interface for 2.0 allows for easier custom implementation of encryption and
decryption. Your
+class only has to implement the {% jlink org.apache.accumulo.core.spi.crypto.CryptoService
%} interface to work with Accumulo.
+The interface has 3 methods:
+  void init(Map<String,String> conf) throws CryptoException;
+  FileEncrypter getFileEncrypter(CryptoEnvironment environment);
+  FileDecrypter getFileDecrypter(CryptoEnvironment environment);
+The `init` method is where you will initialize any resources required for crypto and will
get called once per Tablet Server.
+The `getFileEncrypter` method requires implementation of a {% jlink org.apache.accumulo.core.spi.crypto.FileEncrypter
+for encryption and the `getFileDecrypter` method requires implementation of a {% jlink org.apache.accumulo.core.spi.crypto.FileDecrypter
+for decryption. The `CryptoEnvironment` passed into these methods will provide the scope
of the crypto. 
+The FileEncrypter has two methods:
+  OutputStream encryptStream(OutputStream outputStream) throws CryptoService.CryptoException;
+  byte[] getDecryptionParameters();
+The `encryptStream` method performs the encryption on the provided OutputStream and returns
an OutputStream, most likely 
+wrapped in at least one other OutputStream.  The `getDecryptionParameters` returns a byte
array of anything that will be 
+required to perform decryption. The FileDecrypter only has one method:
+  InputStream decryptStream(InputStream inputStream) throws CryptoService.CryptoException;
+For more help getting started see {% jlink
+## Things to keep in mind
+The on disk encryption configured here is only for RFiles and Write Ahead Logs (WALs).  The
majority of data in Accumulo
+is written to disk with these files but there are a few scenarios that can take place where
data will be unencrypted, 
+even with the crypto service enabled.
+### Sorted WALs
+If a tablet server is killed with WALs enabled, Accumulo will create temporary sorted WALs
during recovery that are unencrypted.  
+These files will only contain recent data that has not been compacted but will be written
to the disk unencrypted. Once recovery 
+is finished, these unencrypted files will be removed.
+### Data in Memory & Logs
+For queries, data is decrypted when read from RFiles and cached in memory.  This means that
data is unencrypted in memory 
+while Accumulo is running.  Depending on the situation, this also means that some data can
be printed to logs. A stacktrace being logged 
+during an exception is one example. Accumulo developers have made sure not to expose data
protected by authorizations during logging but 
+its the additional data that gets encrypted on disk that could be exposed in a log file.

+### Bulk Import
+There are 2 ways to create RFiles for bulk ingest: with the [RFile API][rfile] and during
Map Reduce using [AccumuloOutputFormat].  
+The [RFile API][rfile] allows passing in the configuration properties for encryption mentioned
above.  The [AccumuloOutputFormat] does 
+not allow for encryption of RFiles so any data bulk imported through this process will be
+### Zookeeper
+Accumulo stores a lot of metadata about the cluster in Zookeeper.  Keep in mind that this
metadata does not get encrypted with On Disk encryption enabled.
+## GCM performance
+The AESCryptoService uses GCM mode for RFiles. [Java 9 introduced GHASH hardware support
used by GCM.](
+A test was performed on a VM with 4 2.3GHz processors and 16GB of RAM. The test encrypted
and decrypted arrays of size 131072 bytes 1000000 times. The results are as follows:
+    Java 9 GCM times:
+        Time spent encrypting:        209.210s
+        Time spent decrypting:        276.800s
+    Java 8 GCM times:
+        Time spent encrypting:        2,818.440s
+        Time spent decrypting:        2,883.960s
+As you can see, there is a significant performance hit when running without the GHASH CPU
instruction. It is advised Java 9 or later be used when enabling encryption.
+[SSL]: {% durl administration/ssl %}
+[Kerberos]: {% durl administration/kerberos %}
+[design]: {% durl getting-started/design#rfile %}
+[rfile]: {% jurl org.apache.accumulo.core.client.rfile.RFile %}
+[AccumuloOutputFormat]: {% jurl org.apache.accumulo.core.client.mapred.AccumuloOutputFormat


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message