accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ctubb...@apache.org
Subject [accumulo-website] branch asf-site updated: Jekyll build from master:6799d71
Date Mon, 23 Apr 2018 18:17:57 GMT
This is an automated email from the ASF dual-hosted git repository.

ctubbsii pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 4c133e5  Jekyll build from master:6799d71
4c133e5 is described below

commit 4c133e5ecf1c3822eb783f8fe5ea981ce897fa56
Author: Christopher Tubbs <ctubbsii@apache.org>
AuthorDate: Mon Apr 23 14:17:25 2018 -0400

    Jekyll build from master:6799d71
    
    Add 1.9.0 manual and examples
---
 1.9/accumulo_user_manual.html                      | 12187 +++++++++++++++++++
 1.9/apidocs/allclasses-frame.html                  |   210 +
 1.9/apidocs/allclasses-noframe.html                |   210 +
 1.9/apidocs/constant-values.html                   |   218 +
 1.9/apidocs/deprecated-list.html                   |  1556 +++
 1.9/apidocs/help-doc.html                          |   231 +
 1.9/apidocs/index-all.html                         |  7488 ++++++++++++
 1.9/apidocs/index.html                             |    76 +
 .../accumulo/core/client/AccumuloException.html    |   308 +
 .../core/client/AccumuloSecurityException.html     |   495 +
 .../apache/accumulo/core/client/BatchDeleter.html  |   301 +
 .../apache/accumulo/core/client/BatchScanner.html  |   337 +
 .../apache/accumulo/core/client/BatchWriter.html   |   321 +
 .../accumulo/core/client/BatchWriterConfig.html    |   589 +
 .../client/ClientConfiguration.ClientProperty.html |   669 +
 .../accumulo/core/client/ClientConfiguration.html  |  3189 +++++
 ...lientSideIteratorScanner.ScannerTranslator.html |   308 +
 .../core/client/ClientSideIteratorScanner.html     |   732 ++
 .../core/client/ConditionalWriter.Result.html      |   347 +
 .../core/client/ConditionalWriter.Status.html      |   400 +
 .../accumulo/core/client/ConditionalWriter.html    |   307 +
 .../core/client/ConditionalWriterConfig.html       |   495 +
 .../org/apache/accumulo/core/client/Connector.html |   729 ++
 .../apache/accumulo/core/client/Durability.html    |   402 +
 .../org/apache/accumulo/core/client/Instance.html  |   533 +
 .../client/IsolatedScanner.MemoryRowBuffer.html    |   329 +
 .../IsolatedScanner.MemoryRowBufferFactory.html    |   286 +
 .../core/client/IsolatedScanner.RowBuffer.html     |   273 +
 .../client/IsolatedScanner.RowBufferFactory.html   |   231 +
 .../accumulo/core/client/IsolatedScanner.html      |   625 +
 .../core/client/IteratorSetting.Column.html        |   344 +
 .../accumulo/core/client/IteratorSetting.html      |   805 ++
 .../core/client/MultiTableBatchWriter.html         |   308 +
 .../core/client/MutationsRejectedException.html    |   498 +
 .../core/client/NamespaceExistsException.html      |   316 +
 .../core/client/NamespaceNotEmptyException.html    |   354 +
 .../core/client/NamespaceNotFoundException.html    |   354 +
 .../apache/accumulo/core/client/RowIterator.html   |   375 +
 .../core/client/SampleNotPresentException.html     |   299 +
 .../org/apache/accumulo/core/client/Scanner.html   |   452 +
 .../apache/accumulo/core/client/ScannerBase.html   |   762 ++
 .../core/client/TableDeletedException.html         |   307 +
 .../accumulo/core/client/TableExistsException.html |   316 +
 .../core/client/TableNotFoundException.html        |   374 +
 .../core/client/TableOfflineException.html         |   271 +
 .../accumulo/core/client/TimedOutException.html    |   317 +
 .../accumulo/core/client/ZooKeeperInstance.html    |   796 ++
 .../admin/ActiveCompaction.CompactionReason.html   |   398 +
 .../admin/ActiveCompaction.CompactionType.html     |   384 +
 .../core/client/admin/ActiveCompaction.html        |   502 +
 .../accumulo/core/client/admin/ActiveScan.html     |   520 +
 .../core/client/admin/CompactionConfig.html        |   490 +
 .../client/admin/CompactionStrategyConfig.html     |   362 +
 .../core/client/admin/DelegationTokenConfig.html   |   366 +
 .../accumulo/core/client/admin/DiskUsage.html      |   388 +
 .../apache/accumulo/core/client/admin/FindMax.html |   286 +
 .../core/client/admin/InstanceOperations.html      |   469 +
 .../accumulo/core/client/admin/Locations.html      |   274 +
 .../core/client/admin/NamespaceOperations.html     |   886 ++
 .../core/client/admin/NewTableConfiguration.html   |   395 +
 .../core/client/admin/ReplicationOperations.html   |   364 +
 .../accumulo/core/client/admin/ScanState.html      |   355 +
 .../accumulo/core/client/admin/ScanType.html       |   343 +
 .../core/client/admin/SecurityOperations.html      |   883 ++
 .../core/client/admin/TableOperations.html         |  1916 +++
 .../accumulo/core/client/admin/TimeType.html       |   353 +
 .../ActiveCompaction.CompactionReason.html         |   179 +
 .../class-use/ActiveCompaction.CompactionType.html |   179 +
 .../client/admin/class-use/ActiveCompaction.html   |   168 +
 .../core/client/admin/class-use/ActiveScan.html    |   168 +
 .../client/admin/class-use/CompactionConfig.html   |   202 +
 .../admin/class-use/CompactionStrategyConfig.html  |   183 +
 .../admin/class-use/DelegationTokenConfig.html     |   184 +
 .../core/client/admin/class-use/DiskUsage.html     |   168 +
 .../core/client/admin/class-use/FindMax.html       |   126 +
 .../client/admin/class-use/InstanceOperations.html |   194 +
 .../core/client/admin/class-use/Locations.html     |   169 +
 .../admin/class-use/NamespaceOperations.html       |   195 +
 .../admin/class-use/NewTableConfiguration.html     |   200 +
 .../admin/class-use/ReplicationOperations.html     |   194 +
 .../core/client/admin/class-use/ScanState.html     |   179 +
 .../core/client/admin/class-use/ScanType.html      |   179 +
 .../client/admin/class-use/SecurityOperations.html |   195 +
 .../client/admin/class-use/TableOperations.html    |   194 +
 .../core/client/admin/class-use/TimeType.html      |   270 +
 .../accumulo/core/client/admin/package-frame.html  |    45 +
 .../core/client/admin/package-summary.html         |   264 +
 .../accumulo/core/client/admin/package-tree.html   |   171 +
 .../accumulo/core/client/admin/package-use.html    |   300 +
 .../core/client/class-use/AccumuloException.html   |  1143 ++
 .../class-use/AccumuloSecurityException.html       |  1182 ++
 .../core/client/class-use/BatchDeleter.html        |   241 +
 .../core/client/class-use/BatchScanner.html        |   230 +
 .../core/client/class-use/BatchWriter.html         |   245 +
 .../core/client/class-use/BatchWriterConfig.html   |   392 +
 .../ClientConfiguration.ClientProperty.html        |   215 +
 .../core/client/class-use/ClientConfiguration.html |   421 +
 ...lientSideIteratorScanner.ScannerTranslator.html |   126 +
 .../class-use/ClientSideIteratorScanner.html       |   126 +
 .../client/class-use/ConditionalWriter.Result.html |   183 +
 .../client/class-use/ConditionalWriter.Status.html |   195 +
 .../core/client/class-use/ConditionalWriter.html   |   196 +
 .../client/class-use/ConditionalWriterConfig.html  |   231 +
 .../accumulo/core/client/class-use/Connector.html  |   315 +
 .../accumulo/core/client/class-use/Durability.html |   204 +
 .../accumulo/core/client/class-use/Instance.html   |   383 +
 .../class-use/IsolatedScanner.MemoryRowBuffer.html |   126 +
 .../IsolatedScanner.MemoryRowBufferFactory.html    |   126 +
 .../class-use/IsolatedScanner.RowBuffer.html       |   183 +
 .../IsolatedScanner.RowBufferFactory.html          |   178 +
 .../core/client/class-use/IsolatedScanner.html     |   126 +
 .../client/class-use/IteratorSetting.Column.html   |   168 +
 .../core/client/class-use/IteratorSetting.html     |   538 +
 .../client/class-use/MultiTableBatchWriter.html    |   229 +
 .../class-use/MutationsRejectedException.html      |   266 +
 .../client/class-use/NamespaceExistsException.html |   175 +
 .../class-use/NamespaceNotEmptyException.html      |   168 +
 .../class-use/NamespaceNotFoundException.html      |   291 +
 .../core/client/class-use/RowIterator.html         |   126 +
 .../class-use/SampleNotPresentException.html       |   126 +
 .../accumulo/core/client/class-use/Scanner.html    |   415 +
 .../core/client/class-use/ScannerBase.html         |   322 +
 .../client/class-use/TableDeletedException.html    |   126 +
 .../client/class-use/TableExistsException.html     |   216 +
 .../client/class-use/TableNotFoundException.html   |   772 ++
 .../client/class-use/TableOfflineException.html    |   126 +
 .../core/client/class-use/TimedOutException.html   |   126 +
 .../core/client/class-use/ZooKeeperInstance.html   |   166 +
 .../core/client/lexicoder/AbstractEncoder.html     |   354 +
 .../core/client/lexicoder/BigIntegerLexicoder.html |   348 +
 .../core/client/lexicoder/BytesLexicoder.html      |   347 +
 .../core/client/lexicoder/DateLexicoder.html       |   345 +
 .../core/client/lexicoder/DoubleLexicoder.html     |   345 +
 .../accumulo/core/client/lexicoder/Encoder.html    |   208 +
 .../core/client/lexicoder/FloatLexicoder.html      |   348 +
 .../core/client/lexicoder/IntegerLexicoder.html    |   346 +
 .../accumulo/core/client/lexicoder/Lexicoder.html  |   205 +
 .../core/client/lexicoder/ListLexicoder.html       |   350 +
 .../core/client/lexicoder/LongLexicoder.html       |   345 +
 .../core/client/lexicoder/PairLexicoder.html       |   368 +
 .../core/client/lexicoder/ReverseLexicoder.html    |   342 +
 .../core/client/lexicoder/StringLexicoder.html     |   346 +
 .../core/client/lexicoder/TextLexicoder.html       |   346 +
 .../core/client/lexicoder/UIntegerLexicoder.html   |   347 +
 .../core/client/lexicoder/ULongLexicoder.html      |   351 +
 .../core/client/lexicoder/UUIDLexicoder.html       |   353 +
 .../lexicoder/class-use/AbstractEncoder.html       |   275 +
 .../lexicoder/class-use/BigIntegerLexicoder.html   |   126 +
 .../client/lexicoder/class-use/BytesLexicoder.html |   126 +
 .../client/lexicoder/class-use/DateLexicoder.html  |   126 +
 .../lexicoder/class-use/DoubleLexicoder.html       |   126 +
 .../core/client/lexicoder/class-use/Encoder.html   |   298 +
 .../client/lexicoder/class-use/FloatLexicoder.html |   126 +
 .../lexicoder/class-use/IntegerLexicoder.html      |   126 +
 .../core/client/lexicoder/class-use/Lexicoder.html |   297 +
 .../client/lexicoder/class-use/ListLexicoder.html  |   126 +
 .../client/lexicoder/class-use/LongLexicoder.html  |   126 +
 .../client/lexicoder/class-use/PairLexicoder.html  |   126 +
 .../lexicoder/class-use/ReverseLexicoder.html      |   126 +
 .../lexicoder/class-use/StringLexicoder.html       |   126 +
 .../client/lexicoder/class-use/TextLexicoder.html  |   126 +
 .../lexicoder/class-use/UIntegerLexicoder.html     |   126 +
 .../client/lexicoder/class-use/ULongLexicoder.html |   168 +
 .../client/lexicoder/class-use/UUIDLexicoder.html  |   126 +
 .../core/client/lexicoder/package-frame.html       |    41 +
 .../core/client/lexicoder/package-summary.html     |   262 +
 .../core/client/lexicoder/package-tree.html        |   176 +
 .../core/client/lexicoder/package-use.html         |   211 +
 .../AbstractInputFormat.AbstractRecordReader.html  |   522 +
 .../core/client/mapred/AbstractInputFormat.html    |   947 ++
 .../client/mapred/AccumuloFileOutputFormat.html    |   563 +
 .../core/client/mapred/AccumuloInputFormat.html    |   360 +
 .../mapred/AccumuloMultiTableInputFormat.html      |   368 +
 .../AccumuloOutputFormat.AccumuloRecordWriter.html |   348 +
 .../core/client/mapred/AccumuloOutputFormat.html   |   970 ++
 .../core/client/mapred/AccumuloRowInputFormat.html |   362 +
 .../mapred/InputFormatBase.RangeInputSplit.html    |   367 +
 .../mapred/InputFormatBase.RecordReaderBase.html   |   384 +
 .../core/client/mapred/InputFormatBase.html        |   937 ++
 .../core/client/mapred/RangeInputSplit.html        |   326 +
 .../AbstractInputFormat.AbstractRecordReader.html  |   166 +
 .../mapred/class-use/AbstractInputFormat.html      |   187 +
 .../mapred/class-use/AccumuloFileOutputFormat.html |   126 +
 .../mapred/class-use/AccumuloInputFormat.html      |   126 +
 .../class-use/AccumuloMultiTableInputFormat.html   |   126 +
 .../AccumuloOutputFormat.AccumuloRecordWriter.html |   126 +
 .../mapred/class-use/AccumuloOutputFormat.html     |   126 +
 .../mapred/class-use/AccumuloRowInputFormat.html   |   126 +
 .../class-use/InputFormatBase.RangeInputSplit.html |   166 +
 .../InputFormatBase.RecordReaderBase.html          |   126 +
 .../client/mapred/class-use/InputFormatBase.html   |   174 +
 .../client/mapred/class-use/RangeInputSplit.html   |   215 +
 .../accumulo/core/client/mapred/package-frame.html |    32 +
 .../core/client/mapred/package-summary.html        |   217 +
 .../accumulo/core/client/mapred/package-tree.html  |   174 +
 .../accumulo/core/client/mapred/package-use.html   |   188 +
 .../AbstractInputFormat.AbstractRecordReader.html  |   590 +
 .../core/client/mapreduce/AbstractInputFormat.html |   990 ++
 .../client/mapreduce/AccumuloFileOutputFormat.html |   563 +
 .../core/client/mapreduce/AccumuloInputFormat.html |   362 +
 .../mapreduce/AccumuloMultiTableInputFormat.html   |   370 +
 .../AccumuloOutputFormat.AccumuloRecordWriter.html |   350 +
 .../client/mapreduce/AccumuloOutputFormat.html     |   975 ++
 .../client/mapreduce/AccumuloRowInputFormat.html   |   365 +
 .../mapreduce/InputFormatBase.RangeInputSplit.html |   348 +
 .../InputFormatBase.RecordReaderBase.html          |   388 +
 .../core/client/mapreduce/InputFormatBase.html     |   938 ++
 .../core/client/mapreduce/InputTableConfig.html    |   750 ++
 .../core/client/mapreduce/RangeInputSplit.html     |   995 ++
 .../AbstractInputFormat.AbstractRecordReader.html  |   166 +
 .../mapreduce/class-use/AbstractInputFormat.html   |   187 +
 .../class-use/AccumuloFileOutputFormat.html        |   126 +
 .../mapreduce/class-use/AccumuloInputFormat.html   |   126 +
 .../class-use/AccumuloMultiTableInputFormat.html   |   126 +
 .../AccumuloOutputFormat.AccumuloRecordWriter.html |   126 +
 .../mapreduce/class-use/AccumuloOutputFormat.html  |   126 +
 .../class-use/AccumuloRowInputFormat.html          |   126 +
 .../class-use/InputFormatBase.RangeInputSplit.html |   166 +
 .../InputFormatBase.RecordReaderBase.html          |   126 +
 .../mapreduce/class-use/InputFormatBase.html       |   174 +
 .../mapreduce/class-use/InputTableConfig.html      |   299 +
 .../mapreduce/class-use/RangeInputSplit.html       |   249 +
 .../lib/partition/KeyRangePartitioner.html         |   364 +
 .../mapreduce/lib/partition/RangePartitioner.html  |   364 +
 .../partition/class-use/KeyRangePartitioner.html   |   126 +
 .../lib/partition/class-use/RangePartitioner.html  |   126 +
 .../mapreduce/lib/partition/package-frame.html     |    22 +
 .../mapreduce/lib/partition/package-summary.html   |   152 +
 .../mapreduce/lib/partition/package-tree.html      |   144 +
 .../mapreduce/lib/partition/package-use.html       |   126 +
 .../lib/util/ConfiguratorBase.ConnectorInfo.html   |   396 +
 .../lib/util/ConfiguratorBase.GeneralOpts.html     |   351 +
 .../lib/util/ConfiguratorBase.InstanceOpts.html    |   382 +
 .../mapreduce/lib/util/ConfiguratorBase.html       |   697 ++
 .../lib/util/FileOutputConfigurator.Opts.html      |   351 +
 .../mapreduce/lib/util/FileOutputConfigurator.html |   544 +
 .../lib/util/InputConfigurator.Features.html       |   396 +
 .../lib/util/InputConfigurator.ScanOpts.html       |   411 +
 .../mapreduce/lib/util/InputConfigurator.html      |  1062 ++
 .../lib/util/OutputConfigurator.Features.html      |   366 +
 .../lib/util/OutputConfigurator.WriteOpts.html     |   366 +
 .../mapreduce/lib/util/OutputConfigurator.html     |   592 +
 .../class-use/ConfiguratorBase.ConnectorInfo.html  |   177 +
 .../class-use/ConfiguratorBase.GeneralOpts.html    |   177 +
 .../class-use/ConfiguratorBase.InstanceOpts.html   |   177 +
 .../lib/util/class-use/ConfiguratorBase.html       |   186 +
 .../class-use/FileOutputConfigurator.Opts.html     |   177 +
 .../lib/util/class-use/FileOutputConfigurator.html |   126 +
 .../util/class-use/InputConfigurator.Features.html |   177 +
 .../util/class-use/InputConfigurator.ScanOpts.html |   177 +
 .../lib/util/class-use/InputConfigurator.html      |   126 +
 .../class-use/OutputConfigurator.Features.html     |   177 +
 .../class-use/OutputConfigurator.WriteOpts.html    |   177 +
 .../lib/util/class-use/OutputConfigurator.html     |   126 +
 .../client/mapreduce/lib/util/package-frame.html   |    35 +
 .../client/mapreduce/lib/util/package-summary.html |   230 +
 .../client/mapreduce/lib/util/package-tree.html    |   164 +
 .../client/mapreduce/lib/util/package-use.html     |   219 +
 .../core/client/mapreduce/package-frame.html       |    33 +
 .../core/client/mapreduce/package-summary.html     |   223 +
 .../core/client/mapreduce/package-tree.html        |   187 +
 .../core/client/mapreduce/package-use.html         |   219 +
 .../accumulo/core/client/mock/IteratorAdapter.html |   269 +
 .../accumulo/core/client/mock/MockAccumulo.html    |   422 +
 .../core/client/mock/MockBatchDeleter.html         |   402 +
 .../core/client/mock/MockBatchScanner.html         |   433 +
 .../accumulo/core/client/mock/MockBatchWriter.html |   345 +
 .../accumulo/core/client/mock/MockConnector.html   |   749 ++
 .../accumulo/core/client/mock/MockInstance.html    |   659 +
 .../client/mock/MockMultiTableBatchWriter.html     |   383 +
 .../accumulo/core/client/mock/MockNamespace.html   |   283 +
 .../accumulo/core/client/mock/MockScanner.html     |   583 +
 .../accumulo/core/client/mock/MockScannerBase.html |   428 +
 .../accumulo/core/client/mock/MockTable.html       |   435 +
 .../apache/accumulo/core/client/mock/MockUser.html |   203 +
 .../client/mock/class-use/IteratorAdapter.html     |   126 +
 .../core/client/mock/class-use/MockAccumulo.html   |   192 +
 .../client/mock/class-use/MockBatchDeleter.html    |   126 +
 .../client/mock/class-use/MockBatchScanner.html    |   172 +
 .../client/mock/class-use/MockBatchWriter.html     |   126 +
 .../core/client/mock/class-use/MockConnector.html  |   126 +
 .../core/client/mock/class-use/MockInstance.html   |   126 +
 .../mock/class-use/MockMultiTableBatchWriter.html  |   126 +
 .../core/client/mock/class-use/MockNamespace.html  |   201 +
 .../core/client/mock/class-use/MockScanner.html    |   126 +
 .../client/mock/class-use/MockScannerBase.html     |   188 +
 .../core/client/mock/class-use/MockTable.html      |   184 +
 .../core/client/mock/class-use/MockUser.html       |   126 +
 .../accumulo/core/client/mock/package-frame.html   |    33 +
 .../accumulo/core/client/mock/package-summary.html |   233 +
 .../accumulo/core/client/mock/package-tree.html    |   169 +
 .../accumulo/core/client/mock/package-use.html     |   193 +
 .../apache/accumulo/core/client/package-frame.html |    67 +
 .../accumulo/core/client/package-summary.html      |   392 +
 .../apache/accumulo/core/client/package-tree.html  |   244 +
 .../apache/accumulo/core/client/package-use.html   |   692 ++
 .../client/replication/PeerExistsException.html    |   293 +
 .../client/replication/PeerNotFoundException.html  |   309 +
 .../replication/class-use/PeerExistsException.html |   169 +
 .../class-use/PeerNotFoundException.html           |   168 +
 .../core/client/replication/package-frame.html     |    22 +
 .../core/client/replication/package-summary.html   |   152 +
 .../core/client/replication/package-tree.html      |   148 +
 .../core/client/replication/package-use.html       |   166 +
 .../core/client/rfile/RFile.InputArguments.html    |   266 +
 .../core/client/rfile/RFile.OutputArguments.html   |   258 +
 .../core/client/rfile/RFile.ScannerFSOptions.html  |   255 +
 .../core/client/rfile/RFile.ScannerOptions.html    |   416 +
 .../core/client/rfile/RFile.WriterFSOptions.html   |   255 +
 .../core/client/rfile/RFile.WriterOptions.html     |   334 +
 .../apache/accumulo/core/client/rfile/RFile.html   |   354 +
 .../accumulo/core/client/rfile/RFileSource.html    |   294 +
 .../accumulo/core/client/rfile/RFileWriter.html    |   491 +
 .../rfile/class-use/RFile.InputArguments.html      |   168 +
 .../rfile/class-use/RFile.OutputArguments.html     |   168 +
 .../rfile/class-use/RFile.ScannerFSOptions.html    |   168 +
 .../rfile/class-use/RFile.ScannerOptions.html      |   230 +
 .../rfile/class-use/RFile.WriterFSOptions.html     |   166 +
 .../rfile/class-use/RFile.WriterOptions.html       |   207 +
 .../core/client/rfile/class-use/RFile.html         |   126 +
 .../core/client/rfile/class-use/RFileSource.html   |   168 +
 .../core/client/rfile/class-use/RFileWriter.html   |   166 +
 .../accumulo/core/client/rfile/package-frame.html  |    32 +
 .../core/client/rfile/package-summary.html         |   205 +
 .../accumulo/core/client/rfile/package-tree.html   |   156 +
 .../accumulo/core/client/rfile/package-use.html    |   196 +
 .../core/client/sample/AbstractHashSampler.html    |   373 +
 .../core/client/sample/RowColumnSampler.html       |   387 +
 .../accumulo/core/client/sample/RowSampler.html    |   325 +
 .../accumulo/core/client/sample/Sampler.html       |   278 +
 .../core/client/sample/SamplerConfiguration.html   |   382 +
 .../sample/class-use/AbstractHashSampler.html      |   174 +
 .../client/sample/class-use/RowColumnSampler.html  |   126 +
 .../core/client/sample/class-use/RowSampler.html   |   126 +
 .../core/client/sample/class-use/Sampler.html      |   191 +
 .../sample/class-use/SamplerConfiguration.html     |   406 +
 .../accumulo/core/client/sample/package-frame.html |    28 +
 .../core/client/sample/package-summary.html        |   181 +
 .../accumulo/core/client/sample/package-tree.html  |   149 +
 .../accumulo/core/client/sample/package-use.html   |   276 +
 .../core/client/security/SecurityErrorCode.html    |   547 +
 .../security/class-use/SecurityErrorCode.html      |   259 +
 .../core/client/security/package-frame.html        |    21 +
 .../core/client/security/package-summary.html      |   144 +
 .../core/client/security/package-tree.html         |   143 +
 .../accumulo/core/client/security/package-use.html |   178 +
 ...icationToken.AuthenticationTokenSerializer.html |   356 +
 .../tokens/AuthenticationToken.Properties.html     |   562 +
 .../tokens/AuthenticationToken.TokenProperty.html  |   380 +
 .../security/tokens/AuthenticationToken.html       |   311 +
 .../security/tokens/CredentialProviderToken.html   |   445 +
 .../client/security/tokens/DelegationToken.html    |   238 +
 .../core/client/security/tokens/KerberosToken.html |   625 +
 .../core/client/security/tokens/NullToken.html     |   448 +
 .../core/client/security/tokens/PasswordToken.html |   547 +
 ...icationToken.AuthenticationTokenSerializer.html |   126 +
 .../class-use/AuthenticationToken.Properties.html  |   182 +
 .../AuthenticationToken.TokenProperty.html         |   195 +
 .../tokens/class-use/AuthenticationToken.html      |   483 +
 .../tokens/class-use/CredentialProviderToken.html  |   166 +
 .../security/tokens/class-use/DelegationToken.html |   168 +
 .../security/tokens/class-use/KerberosToken.html   |   166 +
 .../security/tokens/class-use/NullToken.html       |   166 +
 .../security/tokens/class-use/PasswordToken.html   |   213 +
 .../core/client/security/tokens/package-frame.html |    32 +
 .../client/security/tokens/package-summary.html    |   197 +
 .../core/client/security/tokens/package-tree.html  |   178 +
 .../core/client/security/tokens/package-use.html   |   305 +
 .../accumulo/core/data/ArrayByteSequence.html      |   608 +
 .../apache/accumulo/core/data/ByteSequence.html    |   500 +
 .../org/apache/accumulo/core/data/Column.html      |   599 +
 .../apache/accumulo/core/data/ColumnUpdate.html    |   477 +
 .../apache/accumulo/core/data/ComparableBytes.html |   378 +
 .../org/apache/accumulo/core/data/Condition.html   |   688 ++
 .../accumulo/core/data/ConditionalMutation.html    |   454 +
 .../core/data/ConstraintViolationSummary.html      |   460 +
 1.9/apidocs/org/apache/accumulo/core/data/Key.html |  1988 +++
 .../org/apache/accumulo/core/data/KeyExtent.html   |   971 ++
 .../org/apache/accumulo/core/data/KeyValue.html    |   321 +
 .../core/data/Mutation.SERIALIZED_FORMAT.html      |   349 +
 .../org/apache/accumulo/core/data/Mutation.html    |  1702 +++
 .../org/apache/accumulo/core/data/PartialKey.html  |   440 +
 .../org/apache/accumulo/core/data/Range.html       |  1586 +++
 .../org/apache/accumulo/core/data/TabletId.html    |   286 +
 .../accumulo/core/data/Value.Comparator.html       |   320 +
 .../org/apache/accumulo/core/data/Value.html       |   751 ++
 .../core/data/class-use/ArrayByteSequence.html     |   126 +
 .../accumulo/core/data/class-use/ByteSequence.html |   355 +
 .../accumulo/core/data/class-use/Column.html       |   203 +
 .../accumulo/core/data/class-use/ColumnUpdate.html |   187 +
 .../core/data/class-use/ComparableBytes.html       |   126 +
 .../accumulo/core/data/class-use/Condition.html    |   260 +
 .../core/data/class-use/ConditionalMutation.html   |   234 +
 .../data/class-use/ConstraintViolationSummary.html |   207 +
 .../apache/accumulo/core/data/class-use/Key.html   |   741 ++
 .../accumulo/core/data/class-use/KeyExtent.html    |   439 +
 .../accumulo/core/data/class-use/KeyValue.html     |   168 +
 .../data/class-use/Mutation.SERIALIZED_FORMAT.html |   181 +
 .../accumulo/core/data/class-use/Mutation.html     |   384 +
 .../accumulo/core/data/class-use/PartialKey.html   |   212 +
 .../apache/accumulo/core/data/class-use/Range.html |   870 ++
 .../accumulo/core/data/class-use/TabletId.html     |   247 +
 .../core/data/class-use/Value.Comparator.html      |   126 +
 .../apache/accumulo/core/data/class-use/Value.html |   615 +
 .../data/doc-files/mutation-serialization.html     |   196 +
 .../apache/accumulo/core/data/package-frame.html   |    44 +
 .../apache/accumulo/core/data/package-summary.html |   272 +
 .../apache/accumulo/core/data/package-tree.html    |   192 +
 .../org/apache/accumulo/core/data/package-use.html |   541 +
 .../core/security/AuthorizationContainer.html      |   237 +
 .../accumulo/core/security/Authorizations.html     |   703 ++
 .../core/security/ColumnVisibility.Node.html       |   398 +
 .../security/ColumnVisibility.NodeComparator.html  |   308 +
 .../core/security/ColumnVisibility.NodeType.html   |   372 +
 .../accumulo/core/security/ColumnVisibility.html   |   649 +
 .../core/security/NamespacePermission.html         |   518 +
 .../accumulo/core/security/SystemPermission.html   |   528 +
 .../accumulo/core/security/TablePermission.html    |   456 +
 .../core/security/VisibilityConstraint.html        |   280 +
 .../core/security/VisibilityEvaluator.html         |   341 +
 .../core/security/VisibilityParseException.html    |   328 +
 .../security/class-use/AuthorizationContainer.html |   181 +
 .../core/security/class-use/Authorizations.html    |   606 +
 .../security/class-use/ColumnVisibility.Node.html  |   242 +
 .../class-use/ColumnVisibility.NodeComparator.html |   168 +
 .../class-use/ColumnVisibility.NodeType.html       |   191 +
 .../core/security/class-use/ColumnVisibility.html  |   379 +
 .../security/class-use/NamespacePermission.html    |   231 +
 .../core/security/class-use/SystemPermission.html  |   233 +
 .../core/security/class-use/TablePermission.html   |   236 +
 .../security/class-use/VisibilityConstraint.html   |   126 +
 .../security/class-use/VisibilityEvaluator.html    |   126 +
 .../class-use/VisibilityParseException.html        |   168 +
 .../accumulo/core/security/package-frame.html      |    41 +
 .../accumulo/core/security/package-summary.html    |   245 +
 .../accumulo/core/security/package-tree.html       |   180 +
 .../apache/accumulo/core/security/package-use.html |   391 +
 .../apache/accumulo/minicluster/MemoryUnit.html    |   410 +
 .../accumulo/minicluster/MiniAccumuloCluster.html  |   460 +
 .../accumulo/minicluster/MiniAccumuloConfig.html   |   662 +
 .../accumulo/minicluster/MiniAccumuloInstance.html |   313 +
 .../minicluster/MiniAccumuloRunner.Opts.html       |   270 +
 .../MiniAccumuloRunner.PropertiesConverter.html    |   286 +
 .../accumulo/minicluster/MiniAccumuloRunner.html   |   333 +
 .../apache/accumulo/minicluster/ServerType.html    |   408 +
 .../accumulo/minicluster/class-use/MemoryUnit.html |   203 +
 .../minicluster/class-use/MiniAccumuloCluster.html |   126 +
 .../minicluster/class-use/MiniAccumuloConfig.html  |   232 +
 .../class-use/MiniAccumuloInstance.html            |   126 +
 .../class-use/MiniAccumuloRunner.Opts.html         |   126 +
 .../MiniAccumuloRunner.PropertiesConverter.html    |   126 +
 .../minicluster/class-use/MiniAccumuloRunner.html  |   126 +
 .../accumulo/minicluster/class-use/ServerType.html |   209 +
 .../apache/accumulo/minicluster/package-frame.html |    31 +
 .../accumulo/minicluster/package-summary.html      |   191 +
 .../apache/accumulo/minicluster/package-tree.html  |   165 +
 .../apache/accumulo/minicluster/package-use.html   |   167 +
 1.9/apidocs/overview-frame.html                    |    37 +
 1.9/apidocs/overview-summary.html                  |   202 +
 1.9/apidocs/overview-tree.html                     |   585 +
 1.9/apidocs/package-list                           |    16 +
 1.9/apidocs/script.js                              |    30 +
 1.9/apidocs/serialized-form.html                   |   473 +
 1.9/apidocs/stylesheet.css                         |   574 +
 1.9/examples/batch.html                            |   233 +
 1.9/examples/batch.md                              |    55 +
 1.9/examples/bloom.html                            |   407 +
 1.9/examples/bloom.md                              |   219 +
 1.9/examples/bulkIngest.html                       |   208 +
 1.9/examples/bulkIngest.md                         |    33 +
 1.9/examples/classpath.html                        |   246 +
 1.9/examples/classpath.md                          |    68 +
 1.9/examples/client.html                           |   258 +
 1.9/examples/client.md                             |    79 +
 1.9/examples/combiner.html                         |   247 +
 1.9/examples/combiner.md                           |    70 +
 1.9/examples/constraints.html                      |   231 +
 1.9/examples/constraints.md                        |    54 +
 1.9/examples/dirlist.html                          |   308 +
 1.9/examples/dirlist.md                            |   114 +
 1.9/examples/export.html                           |   268 +
 1.9/examples/export.md                             |    91 +
 1.9/examples/filedata.html                         |   237 +
 1.9/examples/filedata.md                           |    47 +
 1.9/examples/filter.html                           |   288 +
 1.9/examples/filter.md                             |   110 +
 1.9/examples/helloworld.html                       |   229 +
 1.9/examples/helloworld.md                         |    47 +
 1.9/examples/index.html                            |   256 +
 1.9/examples/isolation.html                        |   224 +
 1.9/examples/isolation.md                          |    50 +
 1.9/examples/mapred.html                           |   329 +
 1.9/examples/mapred.md                             |   154 +
 1.9/examples/maxmutation.html                      |   224 +
 1.9/examples/maxmutation.md                        |    49 +
 1.9/examples/regex.html                            |   233 +
 1.9/examples/regex.md                              |    57 +
 1.9/examples/reservations.html                     |   242 +
 1.9/examples/reservations.md                       |    66 +
 1.9/examples/rgbalancer.html                       |   339 +
 1.9/examples/rgbalancer.md                         |   159 +
 1.9/examples/rowhash.html                          |   236 +
 1.9/examples/rowhash.md                            |    59 +
 1.9/examples/sample.html                           |   376 +
 1.9/examples/sample.md                             |   192 +
 1.9/examples/shard.html                            |   248 +
 1.9/examples/shard.md                              |    67 +
 1.9/examples/tabletofile.html                      |   236 +
 1.9/examples/tabletofile.md                        |    59 +
 1.9/examples/terasort.html                         |   226 +
 1.9/examples/terasort.md                           |    50 +
 1.9/examples/visibility.html                       |   313 +
 1.9/examples/visibility.md                         |   131 +
 feed.xml                                           |     4 +-
 514 files changed, 177597 insertions(+), 2 deletions(-)

diff --git a/1.9/accumulo_user_manual.html b/1.9/accumulo_user_manual.html
new file mode 100644
index 0000000..2bdc374
--- /dev/null
+++ b/1.9/accumulo_user_manual.html
@@ -0,0 +1,12187 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge"><![endif]-->
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="generator" content="Asciidoctor 1.5.6.1">
+<meta name="author" content="Apache Accumulo Project">
+<title>Apache Accumulo® User Manual Version 1.9</title>
+<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
+<style>
+/* Asciidoctor default stylesheet | MIT License | http://asciidoctor.org */
+/* Remove comment around @import statement below when using as a custom stylesheet */
+/*@import "https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700";*/
+article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}
+audio,canvas,video{display:inline-block}
+audio:not([controls]){display:none;height:0}
+[hidden],template{display:none}
+script{display:none!important}
+html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}
+a{background:transparent}
+a:focus{outline:thin dotted}
+a:active,a:hover{outline:0}
+h1{font-size:2em;margin:.67em 0}
+abbr[title]{border-bottom:1px dotted}
+b,strong{font-weight:bold}
+dfn{font-style:italic}
+hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}
+mark{background:#ff0;color:#000}
+code,kbd,pre,samp{font-family:monospace;font-size:1em}
+pre{white-space:pre-wrap}
+q{quotes:"\201C" "\201D" "\2018" "\2019"}
+small{font-size:80%}
+sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}
+sup{top:-.5em}
+sub{bottom:-.25em}
+img{border:0}
+svg:not(:root){overflow:hidden}
+figure{margin:0}
+fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}
+legend{border:0;padding:0}
+button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}
+button,input{line-height:normal}
+button,select{text-transform:none}
+button,html input[type="button"],input[type="reset"],input[type="submit"]{-webkit-appearance:button;cursor:pointer}
+button[disabled],html input[disabled]{cursor:default}
+input[type="checkbox"],input[type="radio"]{box-sizing:border-box;padding:0}
+input[type="search"]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}
+input[type="search"]::-webkit-search-cancel-button,input[type="search"]::-webkit-search-decoration{-webkit-appearance:none}
+button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}
+textarea{overflow:auto;vertical-align:top}
+table{border-collapse:collapse;border-spacing:0}
+*,*:before,*:after{-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box}
+html,body{font-size:100%}
+body{background:#fff;color:rgba(0,0,0,.8);padding:0;margin:0;font-family:"Noto Serif","DejaVu Serif",serif;font-weight:400;font-style:normal;line-height:1;position:relative;cursor:auto;tab-size:4;-moz-osx-font-smoothing:grayscale;-webkit-font-smoothing:antialiased}
+a:hover{cursor:pointer}
+img,object,embed{max-width:100%;height:auto}
+object,embed{height:100%}
+img{-ms-interpolation-mode:bicubic}
+.left{float:left!important}
+.right{float:right!important}
+.text-left{text-align:left!important}
+.text-right{text-align:right!important}
+.text-center{text-align:center!important}
+.text-justify{text-align:justify!important}
+.hide{display:none}
+img,object,svg{display:inline-block;vertical-align:middle}
+textarea{height:auto;min-height:50px}
+select{width:100%}
+.center{margin-left:auto;margin-right:auto}
+.spread{width:100%}
+p.lead,.paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type p{font-size:1.21875em;line-height:1.6}
+.subheader,.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{line-height:1.45;color:#7a2518;font-weight:400;margin-top:0;margin-bottom:.25em}
+div,dl,dt,dd,ul,ol,li,h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6,pre,form,p,blockquote,th,td{margin:0;padding:0;direction:ltr}
+a{color:#2156a5;text-decoration:underline;line-height:inherit}
+a:hover,a:focus{color:#1d4b8f}
+a img{border:none}
+p{font-family:inherit;font-weight:400;font-size:1em;line-height:1.6;margin-bottom:1.25em;text-rendering:optimizeLegibility}
+p aside{font-size:.875em;line-height:1.35;font-style:italic}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{font-family:"Open Sans","DejaVu Sans",sans-serif;font-weight:300;font-style:normal;color:#ba3925;text-rendering:optimizeLegibility;margin-top:1em;margin-bottom:.5em;line-height:1.0125em}
+h1 small,h2 small,h3 small,#toctitle small,.sidebarblock>.content>.title small,h4 small,h5 small,h6 small{font-size:60%;color:#e99b8f;line-height:0}
+h1{font-size:2.125em}
+h2{font-size:1.6875em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.375em}
+h4,h5{font-size:1.125em}
+h6{font-size:1em}
+hr{border:solid #ddddd8;border-width:1px 0 0;clear:both;margin:1.25em 0 1.1875em;height:0}
+em,i{font-style:italic;line-height:inherit}
+strong,b{font-weight:bold;line-height:inherit}
+small{font-size:60%;line-height:inherit}
+code{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:400;color:rgba(0,0,0,.9)}
+ul,ol,dl{font-size:1em;line-height:1.6;margin-bottom:1.25em;list-style-position:outside;font-family:inherit}
+ul,ol{margin-left:1.5em}
+ul li ul,ul li ol{margin-left:1.25em;margin-bottom:0;font-size:1em}
+ul.square li ul,ul.circle li ul,ul.disc li ul{list-style:inherit}
+ul.square{list-style-type:square}
+ul.circle{list-style-type:circle}
+ul.disc{list-style-type:disc}
+ol li ul,ol li ol{margin-left:1.25em;margin-bottom:0}
+dl dt{margin-bottom:.3125em;font-weight:bold}
+dl dd{margin-bottom:1.25em}
+abbr,acronym{text-transform:uppercase;font-size:90%;color:rgba(0,0,0,.8);border-bottom:1px dotted #ddd;cursor:help}
+abbr{text-transform:none}
+blockquote{margin:0 0 1.25em;padding:.5625em 1.25em 0 1.1875em;border-left:1px solid #ddd}
+blockquote cite{display:block;font-size:.9375em;color:rgba(0,0,0,.6)}
+blockquote cite:before{content:"\2014 \0020"}
+blockquote cite a,blockquote cite a:visited{color:rgba(0,0,0,.6)}
+blockquote,blockquote p{line-height:1.6;color:rgba(0,0,0,.85)}
+@media only screen and (min-width:768px){h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2}
+h1{font-size:2.75em}
+h2{font-size:2.3125em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.6875em}
+h4{font-size:1.4375em}}
+table{background:#fff;margin-bottom:1.25em;border:solid 1px #dedede}
+table thead,table tfoot{background:#f7f8f7;font-weight:bold}
+table thead tr th,table thead tr td,table tfoot tr th,table tfoot tr td{padding:.5em .625em .625em;font-size:inherit;color:rgba(0,0,0,.8);text-align:left}
+table tr th,table tr td{padding:.5625em .625em;font-size:inherit;color:rgba(0,0,0,.8)}
+table tr.even,table tr.alt,table tr:nth-of-type(even){background:#f8f8f7}
+table thead tr th,table tfoot tr th,table tbody tr td,table tr td,table tfoot tr td{display:table-cell;line-height:1.6}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2;word-spacing:-.05em}
+h1 strong,h2 strong,h3 strong,#toctitle strong,.sidebarblock>.content>.title strong,h4 strong,h5 strong,h6 strong{font-weight:400}
+.clearfix:before,.clearfix:after,.float-group:before,.float-group:after{content:" ";display:table}
+.clearfix:after,.float-group:after{clear:both}
+*:not(pre)>code{font-size:.9375em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;background-color:#f7f7f8;-webkit-border-radius:4px;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed;word-wrap:break-word}
+*:not(pre)>code.nobreak{word-wrap:normal}
+*:not(pre)>code.nowrap{white-space:nowrap}
+pre,pre>code{line-height:1.45;color:rgba(0,0,0,.9);font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:400;text-rendering:optimizeSpeed}
+em em{font-style:normal}
+strong strong{font-weight:400}
+.keyseq{color:rgba(51,51,51,.8)}
+kbd{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;display:inline-block;color:rgba(0,0,0,.8);font-size:.65em;line-height:1.45;background-color:#f7f7f7;border:1px solid #ccc;-webkit-border-radius:3px;border-radius:3px;-webkit-box-shadow:0 1px 0 rgba(0,0,0,.2),0 0 0 .1em white inset;box-shadow:0 1px 0 rgba(0,0,0,.2),0 0 0 .1em #fff inset;margin:0 .15em;padding:.2em .5em;vertical-align:middle;position:relative;top:-.1em;white-space:nowrap}
+.keyseq kbd:first-child{margin-left:0}
+.keyseq kbd:last-child{margin-right:0}
+.menuseq,.menuref{color:#000}
+.menuseq b:not(.caret),.menuref{font-weight:inherit}
+.menuseq{word-spacing:-.02em}
+.menuseq b.caret{font-size:1.25em;line-height:.8}
+.menuseq i.caret{font-weight:bold;text-align:center;width:.45em}
+b.button:before,b.button:after{position:relative;top:-1px;font-weight:400}
+b.button:before{content:"[";padding:0 3px 0 2px}
+b.button:after{content:"]";padding:0 2px 0 3px}
+p a>code:hover{color:rgba(0,0,0,.9)}
+#header,#content,#footnotes,#footer{width:100%;margin-left:auto;margin-right:auto;margin-top:0;margin-bottom:0;max-width:62.5em;*zoom:1;position:relative;padding-left:.9375em;padding-right:.9375em}
+#header:before,#header:after,#content:before,#content:after,#footnotes:before,#footnotes:after,#footer:before,#footer:after{content:" ";display:table}
+#header:after,#content:after,#footnotes:after,#footer:after{clear:both}
+#content{margin-top:1.25em}
+#content:before{content:none}
+#header>h1:first-child{color:rgba(0,0,0,.85);margin-top:2.25rem;margin-bottom:0}
+#header>h1:first-child+#toc{margin-top:8px;border-top:1px solid #ddddd8}
+#header>h1:only-child,body.toc2 #header>h1:nth-last-child(2){border-bottom:1px solid #ddddd8;padding-bottom:8px}
+#header .details{border-bottom:1px solid #ddddd8;line-height:1.45;padding-top:.25em;padding-bottom:.25em;padding-left:.25em;color:rgba(0,0,0,.6);display:-ms-flexbox;display:-webkit-flex;display:flex;-ms-flex-flow:row wrap;-webkit-flex-flow:row wrap;flex-flow:row wrap}
+#header .details span:first-child{margin-left:-.125em}
+#header .details span.email a{color:rgba(0,0,0,.85)}
+#header .details br{display:none}
+#header .details br+span:before{content:"\00a0\2013\00a0"}
+#header .details br+span.author:before{content:"\00a0\22c5\00a0";color:rgba(0,0,0,.85)}
+#header .details br+span#revremark:before{content:"\00a0|\00a0"}
+#header #revnumber{text-transform:capitalize}
+#header #revnumber:after{content:"\00a0"}
+#content>h1:first-child:not([class]){color:rgba(0,0,0,.85);border-bottom:1px solid #ddddd8;padding-bottom:8px;margin-top:0;padding-top:1rem;margin-bottom:1.25rem}
+#toc{border-bottom:1px solid #efefed;padding-bottom:.5em}
+#toc>ul{margin-left:.125em}
+#toc ul.sectlevel0>li>a{font-style:italic}
+#toc ul.sectlevel0 ul.sectlevel1{margin:.5em 0}
+#toc ul{font-family:"Open Sans","DejaVu Sans",sans-serif;list-style-type:none}
+#toc li{line-height:1.3334;margin-top:.3334em}
+#toc a{text-decoration:none}
+#toc a:active{text-decoration:underline}
+#toctitle{color:#7a2518;font-size:1.2em}
+@media only screen and (min-width:768px){#toctitle{font-size:1.375em}
+body.toc2{padding-left:15em;padding-right:0}
+#toc.toc2{margin-top:0!important;background-color:#f8f8f7;position:fixed;width:15em;left:0;top:0;border-right:1px solid #efefed;border-top-width:0!important;border-bottom-width:0!important;z-index:1000;padding:1.25em 1em;height:100%;overflow:auto}
+#toc.toc2 #toctitle{margin-top:0;margin-bottom:.8rem;font-size:1.2em}
+#toc.toc2>ul{font-size:.9em;margin-bottom:0}
+#toc.toc2 ul ul{margin-left:0;padding-left:1em}
+#toc.toc2 ul.sectlevel0 ul.sectlevel1{padding-left:0;margin-top:.5em;margin-bottom:.5em}
+body.toc2.toc-right{padding-left:0;padding-right:15em}
+body.toc2.toc-right #toc.toc2{border-right-width:0;border-left:1px solid #efefed;left:auto;right:0}}
+@media only screen and (min-width:1280px){body.toc2{padding-left:20em;padding-right:0}
+#toc.toc2{width:20em}
+#toc.toc2 #toctitle{font-size:1.375em}
+#toc.toc2>ul{font-size:.95em}
+#toc.toc2 ul ul{padding-left:1.25em}
+body.toc2.toc-right{padding-left:0;padding-right:20em}}
+#content #toc{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px}
+#content #toc>:first-child{margin-top:0}
+#content #toc>:last-child{margin-bottom:0}
+#footer{max-width:100%;background-color:rgba(0,0,0,.8);padding:1.25em}
+#footer-text{color:rgba(255,255,255,.8);line-height:1.44}
+.sect1{padding-bottom:.625em}
+@media only screen and (min-width:768px){.sect1{padding-bottom:1.25em}}
+.sect1+.sect1{border-top:1px solid #efefed}
+#content h1>a.anchor,h2>a.anchor,h3>a.anchor,#toctitle>a.anchor,.sidebarblock>.content>.title>a.anchor,h4>a.anchor,h5>a.anchor,h6>a.anchor{position:absolute;z-index:1001;width:1.5ex;margin-left:-1.5ex;display:block;text-decoration:none!important;visibility:hidden;text-align:center;font-weight:400}
+#content h1>a.anchor:before,h2>a.anchor:before,h3>a.anchor:before,#toctitle>a.anchor:before,.sidebarblock>.content>.title>a.anchor:before,h4>a.anchor:before,h5>a.anchor:before,h6>a.anchor:before{content:"\00A7";font-size:.85em;display:block;padding-top:.1em}
+#content h1:hover>a.anchor,#content h1>a.anchor:hover,h2:hover>a.anchor,h2>a.anchor:hover,h3:hover>a.anchor,#toctitle:hover>a.anchor,.sidebarblock>.content>.title:hover>a.anchor,h3>a.anchor:hover,#toctitle>a.anchor:hover,.sidebarblock>.content>.title>a.anchor:hover,h4:hover>a.anchor,h4>a.anchor:hover,h5:hover>a.anchor,h5>a.anchor:hover,h6:hover>a.anchor,h6>a.anchor:hover{visibility:visible}
+#content h1>a.link,h2>a.link,h3>a.link,#toctitle>a.link,.sidebarblock>.content>.title>a.link,h4>a.link,h5>a.link,h6>a.link{color:#ba3925;text-decoration:none}
+#content h1>a.link:hover,h2>a.link:hover,h3>a.link:hover,#toctitle>a.link:hover,.sidebarblock>.content>.title>a.link:hover,h4>a.link:hover,h5>a.link:hover,h6>a.link:hover{color:#a53221}
+.audioblock,.imageblock,.literalblock,.listingblock,.stemblock,.videoblock{margin-bottom:1.25em}
+.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{text-rendering:optimizeLegibility;text-align:left;font-family:"Noto Serif","DejaVu Serif",serif;font-size:1rem;font-style:italic}
+table.tableblock>caption.title{white-space:nowrap;overflow:visible;max-width:0}
+.paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type p{color:rgba(0,0,0,.85)}
+table.tableblock #preamble>.sectionbody>.paragraph:first-of-type p{font-size:inherit}
+.admonitionblock>table{border-collapse:separate;border:0;background:none;width:100%}
+.admonitionblock>table td.icon{text-align:center;width:80px}
+.admonitionblock>table td.icon img{max-width:initial}
+.admonitionblock>table td.icon .title{font-weight:bold;font-family:"Open Sans","DejaVu Sans",sans-serif;text-transform:uppercase}
+.admonitionblock>table td.content{padding-left:1.125em;padding-right:1.25em;border-left:1px solid #ddddd8;color:rgba(0,0,0,.6)}
+.admonitionblock>table td.content>:last-child>:last-child{margin-bottom:0}
+.exampleblock>.content{border-style:solid;border-width:1px;border-color:#e6e6e6;margin-bottom:1.25em;padding:1.25em;background:#fff;-webkit-border-radius:4px;border-radius:4px}
+.exampleblock>.content>:first-child{margin-top:0}
+.exampleblock>.content>:last-child{margin-bottom:0}
+.sidebarblock{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px}
+.sidebarblock>:first-child{margin-top:0}
+.sidebarblock>:last-child{margin-bottom:0}
+.sidebarblock>.content>.title{color:#7a2518;margin-top:0;text-align:center}
+.exampleblock>.content>:last-child>:last-child,.exampleblock>.content .olist>ol>li:last-child>:last-child,.exampleblock>.content .ulist>ul>li:last-child>:last-child,.exampleblock>.content .qlist>ol>li:last-child>:last-child,.sidebarblock>.content>:last-child>:last-child,.sidebarblock>.content .olist>ol>li:last-child>:last-child,.sidebarblock>.content .ulist>ul>li:last-child>:last-child,.sidebarblock>.content .qlist>ol>li:last-child>:last-child{margin-bottom:0}
+.literalblock pre,.listingblock pre:not(.highlight),.listingblock pre[class="highlight"],.listingblock pre[class^="highlight "],.listingblock pre.CodeRay,.listingblock pre.prettyprint{background:#f7f7f8}
+.sidebarblock .literalblock pre,.sidebarblock .listingblock pre:not(.highlight),.sidebarblock .listingblock pre[class="highlight"],.sidebarblock .listingblock pre[class^="highlight "],.sidebarblock .listingblock pre.CodeRay,.sidebarblock .listingblock pre.prettyprint{background:#f2f1f1}
+.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{-webkit-border-radius:4px;border-radius:4px;word-wrap:break-word;padding:1em;font-size:.8125em}
+.literalblock pre.nowrap,.literalblock pre[class].nowrap,.listingblock pre.nowrap,.listingblock pre[class].nowrap{overflow-x:auto;white-space:pre;word-wrap:normal}
+@media only screen and (min-width:768px){.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{font-size:.90625em}}
+@media only screen and (min-width:1280px){.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{font-size:1em}}
+.literalblock.output pre{color:#f7f7f8;background-color:rgba(0,0,0,.9)}
+.listingblock pre.highlightjs{padding:0}
+.listingblock pre.highlightjs>code{padding:1em;-webkit-border-radius:4px;border-radius:4px}
+.listingblock pre.prettyprint{border-width:0}
+.listingblock>.content{position:relative}
+.listingblock code[data-lang]:before{display:none;content:attr(data-lang);position:absolute;font-size:.75em;top:.425rem;right:.5rem;line-height:1;text-transform:uppercase;color:#999}
+.listingblock:hover code[data-lang]:before{display:block}
+.listingblock.terminal pre .command:before{content:attr(data-prompt);padding-right:.5em;color:#999}
+.listingblock.terminal pre .command:not([data-prompt]):before{content:"$"}
+table.pyhltable{border-collapse:separate;border:0;margin-bottom:0;background:none}
+table.pyhltable td{vertical-align:top;padding-top:0;padding-bottom:0;line-height:1.45}
+table.pyhltable td.code{padding-left:.75em;padding-right:0}
+pre.pygments .lineno,table.pyhltable td:not(.code){color:#999;padding-left:0;padding-right:.5em;border-right:1px solid #ddddd8}
+pre.pygments .lineno{display:inline-block;margin-right:.25em}
+table.pyhltable .linenodiv{background:none!important;padding-right:0!important}
+.quoteblock{margin:0 1em 1.25em 1.5em;display:table}
+.quoteblock>.title{margin-left:-1.5em;margin-bottom:.75em}
+.quoteblock blockquote,.quoteblock blockquote p{color:rgba(0,0,0,.85);font-size:1.15rem;line-height:1.75;word-spacing:.1em;letter-spacing:0;font-style:italic;text-align:justify}
+.quoteblock blockquote{margin:0;padding:0;border:0}
+.quoteblock blockquote:before{content:"\201c";float:left;font-size:2.75em;font-weight:bold;line-height:.6em;margin-left:-.6em;color:#7a2518;text-shadow:0 1px 2px rgba(0,0,0,.1)}
+.quoteblock blockquote>.paragraph:last-child p{margin-bottom:0}
+.quoteblock .attribution{margin-top:.5em;margin-right:.5ex;text-align:right}
+.quoteblock .quoteblock{margin-left:0;margin-right:0;padding:.5em 0;border-left:3px solid rgba(0,0,0,.6)}
+.quoteblock .quoteblock blockquote{padding:0 0 0 .75em}
+.quoteblock .quoteblock blockquote:before{display:none}
+.verseblock{margin:0 1em 1.25em 1em}
+.verseblock pre{font-family:"Open Sans","DejaVu Sans",sans;font-size:1.15rem;color:rgba(0,0,0,.85);font-weight:300;text-rendering:optimizeLegibility}
+.verseblock pre strong{font-weight:400}
+.verseblock .attribution{margin-top:1.25rem;margin-left:.5ex}
+.quoteblock .attribution,.verseblock .attribution{font-size:.9375em;line-height:1.45;font-style:italic}
+.quoteblock .attribution br,.verseblock .attribution br{display:none}
+.quoteblock .attribution cite,.verseblock .attribution cite{display:block;letter-spacing:-.025em;color:rgba(0,0,0,.6)}
+.quoteblock.abstract{margin:0 0 1.25em 0;display:block}
+.quoteblock.abstract blockquote,.quoteblock.abstract blockquote p{text-align:left;word-spacing:0}
+.quoteblock.abstract blockquote:before,.quoteblock.abstract blockquote p:first-of-type:before{display:none}
+table.tableblock{max-width:100%;border-collapse:separate}
+table.tableblock td>.paragraph:last-child p>p:last-child,table.tableblock th>p:last-child,table.tableblock td>p:last-child{margin-bottom:0}
+table.tableblock,th.tableblock,td.tableblock{border:0 solid #dedede}
+table.grid-all>thead>tr>.tableblock,table.grid-all>tbody>tr>.tableblock{border-width:0 1px 1px 0}
+table.grid-all>tfoot>tr>.tableblock{border-width:1px 1px 0 0}
+table.grid-cols>*>tr>.tableblock{border-width:0 1px 0 0}
+table.grid-rows>thead>tr>.tableblock,table.grid-rows>tbody>tr>.tableblock{border-width:0 0 1px 0}
+table.grid-rows>tfoot>tr>.tableblock{border-width:1px 0 0 0}
+table.grid-all>*>tr>.tableblock:last-child,table.grid-cols>*>tr>.tableblock:last-child{border-right-width:0}
+table.grid-all>tbody>tr:last-child>.tableblock,table.grid-all>thead:last-child>tr>.tableblock,table.grid-rows>tbody>tr:last-child>.tableblock,table.grid-rows>thead:last-child>tr>.tableblock{border-bottom-width:0}
+table.frame-all{border-width:1px}
+table.frame-sides{border-width:0 1px}
+table.frame-topbot{border-width:1px 0}
+th.halign-left,td.halign-left{text-align:left}
+th.halign-right,td.halign-right{text-align:right}
+th.halign-center,td.halign-center{text-align:center}
+th.valign-top,td.valign-top{vertical-align:top}
+th.valign-bottom,td.valign-bottom{vertical-align:bottom}
+th.valign-middle,td.valign-middle{vertical-align:middle}
+table thead th,table tfoot th{font-weight:bold}
+tbody tr th{display:table-cell;line-height:1.6;background:#f7f8f7}
+tbody tr th,tbody tr th p,tfoot tr th,tfoot tr th p{color:rgba(0,0,0,.8);font-weight:bold}
+p.tableblock>code:only-child{background:none;padding:0}
+p.tableblock{font-size:1em}
+td>div.verse{white-space:pre}
+ol{margin-left:1.75em}
+ul li ol{margin-left:1.5em}
+dl dd{margin-left:1.125em}
+dl dd:last-child,dl dd:last-child>:last-child{margin-bottom:0}
+ol>li p,ul>li p,ul dd,ol dd,.olist .olist,.ulist .ulist,.ulist .olist,.olist .ulist{margin-bottom:.625em}
+ul.checklist,ul.none,ol.none,ul.no-bullet,ol.no-bullet,ol.unnumbered,ul.unstyled,ol.unstyled{list-style-type:none}
+ul.no-bullet,ol.no-bullet,ol.unnumbered{margin-left:.625em}
+ul.unstyled,ol.unstyled{margin-left:0}
+ul.checklist{margin-left:.625em}
+ul.checklist li>p:first-child>.fa-square-o:first-child,ul.checklist li>p:first-child>.fa-check-square-o:first-child{width:1.25em;font-size:.8em;position:relative;bottom:.125em}
+ul.checklist li>p:first-child>input[type="checkbox"]:first-child{margin-right:.25em}
+ul.inline{margin:0 auto .625em auto;margin-left:-1.375em;margin-right:0;padding:0;list-style:none;overflow:hidden}
+ul.inline>li{list-style:none;float:left;margin-left:1.375em;display:block}
+ul.inline>li>*{display:block}
+.unstyled dl dt{font-weight:400;font-style:normal}
+ol.arabic{list-style-type:decimal}
+ol.decimal{list-style-type:decimal-leading-zero}
+ol.loweralpha{list-style-type:lower-alpha}
+ol.upperalpha{list-style-type:upper-alpha}
+ol.lowerroman{list-style-type:lower-roman}
+ol.upperroman{list-style-type:upper-roman}
+ol.lowergreek{list-style-type:lower-greek}
+.hdlist>table,.colist>table{border:0;background:none}
+.hdlist>table>tbody>tr,.colist>table>tbody>tr{background:none}
+td.hdlist1,td.hdlist2{vertical-align:top;padding:0 .625em}
+td.hdlist1{font-weight:bold;padding-bottom:1.25em}
+.literalblock+.colist,.listingblock+.colist{margin-top:-.5em}
+.colist>table tr>td:first-of-type{padding:.4em .75em 0 .75em;line-height:1;vertical-align:top}
+.colist>table tr>td:first-of-type img{max-width:initial}
+.colist>table tr>td:last-of-type{padding:.25em 0}
+.thumb,.th{line-height:0;display:inline-block;border:solid 4px #fff;-webkit-box-shadow:0 0 0 1px #ddd;box-shadow:0 0 0 1px #ddd}
+.imageblock.left,.imageblock[style*="float: left"]{margin:.25em .625em 1.25em 0}
+.imageblock.right,.imageblock[style*="float: right"]{margin:.25em 0 1.25em .625em}
+.imageblock>.title{margin-bottom:0}
+.imageblock.thumb,.imageblock.th{border-width:6px}
+.imageblock.thumb>.title,.imageblock.th>.title{padding:0 .125em}
+.image.left,.image.right{margin-top:.25em;margin-bottom:.25em;display:inline-block;line-height:0}
+.image.left{margin-right:.625em}
+.image.right{margin-left:.625em}
+a.image{text-decoration:none;display:inline-block}
+a.image object{pointer-events:none}
+sup.footnote,sup.footnoteref{font-size:.875em;position:static;vertical-align:super}
+sup.footnote a,sup.footnoteref a{text-decoration:none}
+sup.footnote a:active,sup.footnoteref a:active{text-decoration:underline}
+#footnotes{padding-top:.75em;padding-bottom:.75em;margin-bottom:.625em}
+#footnotes hr{width:20%;min-width:6.25em;margin:-.25em 0 .75em 0;border-width:1px 0 0 0}
+#footnotes .footnote{padding:0 .375em 0 .225em;line-height:1.3334;font-size:.875em;margin-left:1.2em;text-indent:-1.05em;margin-bottom:.2em}
+#footnotes .footnote a:first-of-type{font-weight:bold;text-decoration:none}
+#footnotes .footnote:last-of-type{margin-bottom:0}
+#content #footnotes{margin-top:-.625em;margin-bottom:0;padding:.75em 0}
+.gist .file-data>table{border:0;background:#fff;width:100%;margin-bottom:0}
+.gist .file-data>table td.line-data{width:99%}
+div.unbreakable{page-break-inside:avoid}
+.big{font-size:larger}
+.small{font-size:smaller}
+.underline{text-decoration:underline}
+.overline{text-decoration:overline}
+.line-through{text-decoration:line-through}
+.aqua{color:#00bfbf}
+.aqua-background{background-color:#00fafa}
+.black{color:#000}
+.black-background{background-color:#000}
+.blue{color:#0000bf}
+.blue-background{background-color:#0000fa}
+.fuchsia{color:#bf00bf}
+.fuchsia-background{background-color:#fa00fa}
+.gray{color:#606060}
+.gray-background{background-color:#7d7d7d}
+.green{color:#006000}
+.green-background{background-color:#007d00}
+.lime{color:#00bf00}
+.lime-background{background-color:#00fa00}
+.maroon{color:#600000}
+.maroon-background{background-color:#7d0000}
+.navy{color:#000060}
+.navy-background{background-color:#00007d}
+.olive{color:#606000}
+.olive-background{background-color:#7d7d00}
+.purple{color:#600060}
+.purple-background{background-color:#7d007d}
+.red{color:#bf0000}
+.red-background{background-color:#fa0000}
+.silver{color:#909090}
+.silver-background{background-color:#bcbcbc}
+.teal{color:#006060}
+.teal-background{background-color:#007d7d}
+.white{color:#bfbfbf}
+.white-background{background-color:#fafafa}
+.yellow{color:#bfbf00}
+.yellow-background{background-color:#fafa00}
+span.icon>.fa{cursor:default}
+a span.icon>.fa{cursor:inherit}
+.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default}
+.admonitionblock td.icon .icon-note:before{content:"\f05a";color:#19407c}
+.admonitionblock td.icon .icon-tip:before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111}
+.admonitionblock td.icon .icon-warning:before{content:"\f071";color:#bf6900}
+.admonitionblock td.icon .icon-caution:before{content:"\f06d";color:#bf3400}
+.admonitionblock td.icon .icon-important:before{content:"\f06a";color:#bf0000}
+.conum[data-value]{display:inline-block;color:#fff!important;background-color:rgba(0,0,0,.8);-webkit-border-radius:100px;border-radius:100px;text-align:center;font-size:.75em;width:1.67em;height:1.67em;line-height:1.67em;font-family:"Open Sans","DejaVu Sans",sans-serif;font-style:normal;font-weight:bold}
+.conum[data-value] *{color:#fff!important}
+.conum[data-value]+b{display:none}
+.conum[data-value]:after{content:attr(data-value)}
+pre .conum[data-value]{position:relative;top:-.125em}
+b.conum *{color:inherit!important}
+.conum:not([data-value]):empty{display:none}
+dt,th.tableblock,td.content,div.footnote{text-rendering:optimizeLegibility}
+h1,h2,p,td.content,span.alt{letter-spacing:-.01em}
+p strong,td.content strong,div.footnote strong{letter-spacing:-.005em}
+p,blockquote,dt,td.content,span.alt{font-size:1.0625rem}
+p{margin-bottom:1.25rem}
+.sidebarblock p,.sidebarblock dt,.sidebarblock td.content,p.tableblock{font-size:1em}
+.exampleblock>.content{background-color:#fffef7;border-color:#e0e0dc;-webkit-box-shadow:0 1px 4px #e0e0dc;box-shadow:0 1px 4px #e0e0dc}
+.print-only{display:none!important}
+@media print{@page{margin:1.25cm .75cm}
+*{-webkit-box-shadow:none!important;box-shadow:none!important;text-shadow:none!important}
+a{color:inherit!important;text-decoration:underline!important}
+a.bare,a[href^="#"],a[href^="mailto:"]{text-decoration:none!important}
+a[href^="http:"]:not(.bare):after,a[href^="https:"]:not(.bare):after{content:"(" attr(href) ")";display:inline-block;font-size:.875em;padding-left:.25em}
+abbr[title]:after{content:" (" attr(title) ")"}
+pre,blockquote,tr,img,object,svg{page-break-inside:avoid}
+thead{display:table-header-group}
+svg{max-width:100%}
+p,blockquote,dt,td.content{font-size:1em;orphans:3;widows:3}
+h2,h3,#toctitle,.sidebarblock>.content>.title{page-break-after:avoid}
+#toc,.sidebarblock,.exampleblock>.content{background:none!important}
+#toc{border-bottom:1px solid #ddddd8!important;padding-bottom:0!important}
+.sect1{padding-bottom:0!important}
+.sect1+.sect1{border:0!important}
+#header>h1:first-child{margin-top:1.25rem}
+body.book #header{text-align:center}
+body.book #header>h1:first-child{border:0!important;margin:2.5em 0 1em 0}
+body.book #header .details{border:0!important;display:block;padding:0!important}
+body.book #header .details span:first-child{margin-left:0!important}
+body.book #header .details br{display:block}
+body.book #header .details br+span:before{content:none!important}
+body.book #toc{border:0!important;text-align:left!important;padding:0!important;margin:0!important}
+body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-break-before:always}
+.listingblock code[data-lang]:before{display:block}
+#footer{background:none!important;padding:0 .9375em}
+#footer-text{color:rgba(0,0,0,.6)!important;font-size:.9em}
+.hide-on-print{display:none!important}
+.print-only{display:block!important}
+.hide-for-print{display:none!important}
+.show-for-print{display:inherit!important}}
+</style>
+</head>
+<body class="book toc2 toc-left">
+<div id="header">
+<h1>Apache Accumulo® User Manual Version 1.9</h1>
+<div class="details">
+<span id="author" class="author">Apache Accumulo Project</span><br>
+<span id="email" class="email"><a href="mailto:dev@accumulo.apache.org">dev@accumulo.apache.org</a></span><br>
+</div>
+<div id="toc" class="toc2">
+<div id="toctitle">Apache Accumulo 1.9</div>
+<ul class="sectlevel1">
+<li><a href="#_introduction">1. Introduction</a></li>
+<li><a href="#_accumulo_design">2. Accumulo Design</a>
+<ul class="sectlevel2">
+<li><a href="#_data_model">2.1. Data Model</a></li>
+<li><a href="#_architecture">2.2. Architecture</a></li>
+<li><a href="#_components">2.3. Components</a>
+<ul class="sectlevel3">
+<li><a href="#_tablet_server">2.3.1. Tablet Server</a></li>
+<li><a href="#_garbage_collector">2.3.2. Garbage Collector</a></li>
+<li><a href="#_master">2.3.3. Master</a></li>
+<li><a href="#_tracer">2.3.4. Tracer</a></li>
+<li><a href="#_monitor">2.3.5. Monitor</a></li>
+<li><a href="#_client">2.3.6. Client</a></li>
+</ul>
+</li>
+<li><a href="#_data_management">2.4. Data Management</a></li>
+<li><a href="#_tablet_service">2.5. Tablet Service</a></li>
+<li><a href="#_compactions">2.6. Compactions</a></li>
+<li><a href="#_splitting">2.7. Splitting</a></li>
+<li><a href="#_fault_tolerance">2.8. Fault-Tolerance</a></li>
+</ul>
+</li>
+<li><a href="#_accumulo_shell">3. Accumulo Shell</a>
+<ul class="sectlevel2">
+<li><a href="#_basic_administration">3.1. Basic Administration</a></li>
+<li><a href="#_table_maintenance">3.2. Table Maintenance</a></li>
+<li><a href="#_user_administration">3.3. User Administration</a></li>
+<li><a href="#_jsr_223_support_in_the_shell">3.4. JSR-223 Support in the Shell</a></li>
+</ul>
+</li>
+<li><a href="#_writing_accumulo_clients">4. Writing Accumulo Clients</a>
+<ul class="sectlevel2">
+<li><a href="#_running_client_code">4.1. Running Client Code</a></li>
+<li><a href="#_connecting">4.2. Connecting</a></li>
+<li><a href="#_writing_data">4.3. Writing Data</a>
+<ul class="sectlevel3">
+<li><a href="#_batchwriter">4.3.1. BatchWriter</a></li>
+<li><a href="#_conditionalwriter">4.3.2. ConditionalWriter</a></li>
+<li><a href="#_durability">4.3.3. Durability</a></li>
+</ul>
+</li>
+<li><a href="#_reading_data">4.4. Reading Data</a>
+<ul class="sectlevel3">
+<li><a href="#_scanner">4.4.1. Scanner</a></li>
+<li><a href="#_isolated_scanner">4.4.2. Isolated Scanner</a></li>
+<li><a href="#_batchscanner">4.4.3. BatchScanner</a></li>
+</ul>
+</li>
+<li><a href="#_proxy">4.5. Proxy</a>
+<ul class="sectlevel3">
+<li><a href="#_prerequisites">4.5.1. Prerequisites</a></li>
+<li><a href="#_configuration">4.5.2. Configuration</a></li>
+<li><a href="#_running_the_proxy_server">4.5.3. Running the Proxy Server</a></li>
+<li><a href="#_creating_a_proxy_client">4.5.4. Creating a Proxy Client</a></li>
+<li><a href="#_using_a_proxy_client">4.5.5. Using a Proxy Client</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_development_clients">5. Development Clients</a>
+<ul class="sectlevel2">
+<li><a href="#_mock_accumulo">5.1. Mock Accumulo</a></li>
+<li><a href="#_mini_accumulo_cluster">5.2. Mini Accumulo Cluster</a></li>
+</ul>
+</li>
+<li><a href="#_table_configuration">6. Table Configuration</a>
+<ul class="sectlevel2">
+<li><a href="#_locality_groups">6.1. Locality Groups</a>
+<ul class="sectlevel3">
+<li><a href="#_managing_locality_groups_via_the_shell">6.1.1. Managing Locality Groups via the Shell</a></li>
+<li><a href="#_managing_locality_groups_via_the_client_api">6.1.2. Managing Locality Groups via the Client API</a></li>
+</ul>
+</li>
+<li><a href="#_constraints">6.2. Constraints</a></li>
+<li><a href="#_bloom_filters">6.3. Bloom Filters</a></li>
+<li><a href="#_iterators">6.4. Iterators</a>
+<ul class="sectlevel3">
+<li><a href="#_setting_iterators_via_the_shell">6.4.1. Setting Iterators via the Shell</a></li>
+<li><a href="#_setting_iterators_programmatically">6.4.2. Setting Iterators Programmatically</a></li>
+<li><a href="#_versioning_iterators_and_timestamps">6.4.3. Versioning Iterators and Timestamps</a>
+<ul class="sectlevel4">
+<li><a href="#_logical_time">Logical Time</a></li>
+<li><a href="#_deletes">Deletes</a></li>
+</ul>
+</li>
+<li><a href="#_filters">6.4.4. Filters</a></li>
+<li><a href="#_combiners">6.4.5. Combiners</a></li>
+</ul>
+</li>
+<li><a href="#_block_cache">6.5. Block Cache</a></li>
+<li><a href="#_compaction">6.6. Compaction</a></li>
+<li><a href="#_pre_splitting_tables">6.7. Pre-splitting tables</a></li>
+<li><a href="#_merging_tablets">6.8. Merging tablets</a></li>
+<li><a href="#_delete_range">6.9. Delete Range</a></li>
+<li><a href="#_cloning_tables">6.10. Cloning Tables</a></li>
+<li><a href="#_exporting_tables">6.11. Exporting Tables</a>
+<ul class="sectlevel3">
+<li><a href="#_table_import_export_example">6.11.1. Table Import/Export Example</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_iterator_design">7. Iterator Design</a>
+<ul class="sectlevel2">
+<li><a href="#_instantiation">7.1. Instantiation</a></li>
+<li><a href="#_interface">7.2. Interface</a>
+<ul class="sectlevel3">
+<li><a href="#_code_init_code">7.2.1. <code>init</code></a></li>
+<li><a href="#_code_seek_code">7.2.2. <code>seek</code></a></li>
+<li><a href="#_code_next_code">7.2.3. <code>next</code></a></li>
+<li><a href="#_code_hastop_code">7.2.4. <code>hasTop</code></a></li>
+<li><a href="#_code_gettopkey_code_and_code_gettopvalue_code">7.2.5. <code>getTopKey</code> and <code>getTopValue</code></a></li>
+<li><a href="#_code_deepcopy_code">7.2.6. <code>deepCopy</code></a></li>
+</ul>
+</li>
+<li><a href="#_tabletserver_invocation_of_iterators">7.3. TabletServer invocation of Iterators</a></li>
+<li><a href="#_isolation">7.4. Isolation</a></li>
+<li><a href="#_abstract_iterators">7.5. Abstract Iterators</a>
+<ul class="sectlevel3">
+<li><a href="#_filter">7.5.1. Filter</a></li>
+<li><a href="#_combiner">7.5.2. Combiner</a></li>
+</ul>
+</li>
+<li><a href="#_best_practices">7.6. Best practices</a>
+<ul class="sectlevel3">
+<li><a href="#_avoid_special_logic_encoded_in_ranges">7.6.1. Avoid special logic encoded in Ranges</a></li>
+<li><a href="#_code_seek_code_ing_backwards">7.6.2. <code>seek</code>'ing backwards</a></li>
+<li><a href="#_take_caution_in_constructing_new_data_in_an_iterator">7.6.3. Take caution in constructing new data in an Iterator</a></li>
+</ul>
+</li>
+<li><a href="#_final_things_to_remember">7.7. Final things to remember</a>
+<ul class="sectlevel3">
+<li><a href="#_method_call_order">7.7.1. Method call order</a></li>
+<li><a href="#_teardown">7.7.2. Teardown</a></li>
+</ul>
+</li>
+<li><a href="#_compaction_time_iterators">7.8. Compaction-time Iterators</a></li>
+</ul>
+</li>
+<li><a href="#_iterator_testing">8. Iterator Testing</a>
+<ul class="sectlevel2">
+<li><a href="#_framework_use">8.1. Framework Use</a></li>
+<li><a href="#_normal_test_outline">8.2. Normal Test Outline</a></li>
+<li><a href="#_limitations">8.3. Limitations</a></li>
+</ul>
+</li>
+<li><a href="#_table_design">9. Table Design</a>
+<ul class="sectlevel2">
+<li><a href="#_basic_table">9.1. Basic Table</a></li>
+<li><a href="#_rowid_design">9.2. RowID Design</a></li>
+<li><a href="#_lexicoders">9.3. Lexicoders</a></li>
+<li><a href="#_indexing">9.4. Indexing</a></li>
+<li><a href="#_entity_attribute_and_graph_tables">9.5. Entity-Attribute and Graph Tables</a></li>
+<li><a href="#_document_partitioned_indexing">9.6. Document-Partitioned Indexing</a></li>
+</ul>
+</li>
+<li><a href="#_high_speed_ingest">10. High-Speed Ingest</a>
+<ul class="sectlevel2">
+<li><a href="#_pre_splitting_new_tables">10.1. Pre-Splitting New Tables</a></li>
+<li><a href="#_multiple_ingester_clients">10.2. Multiple Ingester Clients</a></li>
+<li><a href="#_bulk_ingest">10.3. Bulk Ingest</a></li>
+<li><a href="#_logical_time_for_bulk_ingest">10.4. Logical Time for Bulk Ingest</a></li>
+<li><a href="#_mapreduce_ingest">10.5. MapReduce Ingest</a></li>
+</ul>
+</li>
+<li><a href="#_analytics">11. Analytics</a>
+<ul class="sectlevel2">
+<li><a href="#_mapreduce">11.1. MapReduce</a>
+<ul class="sectlevel3">
+<li><a href="#_mapper_and_reducer_classes">11.1.1. Mapper and Reducer classes</a></li>
+<li><a href="#_accumuloinputformat_options">11.1.2. AccumuloInputFormat options</a></li>
+<li><a href="#_accumulomultitableinputformat_options">11.1.3. AccumuloMultiTableInputFormat options</a></li>
+<li><a href="#_accumulooutputformat_options">11.1.4. AccumuloOutputFormat options</a></li>
+</ul>
+</li>
+<li><a href="#_combiners_2">11.2. Combiners</a>
+<ul class="sectlevel3">
+<li><a href="#_feature_vectors">11.2.1. Feature Vectors</a></li>
+</ul>
+</li>
+<li><a href="#_statistical_modeling">11.3. Statistical Modeling</a></li>
+</ul>
+</li>
+<li><a href="#_security">12. Security</a>
+<ul class="sectlevel2">
+<li><a href="#_security_label_expressions">12.1. Security Label Expressions</a></li>
+<li><a href="#_security_label_expression_syntax">12.2. Security Label Expression Syntax</a></li>
+<li><a href="#_authorization">12.3. Authorization</a></li>
+<li><a href="#_user_authorizations">12.4. User Authorizations</a></li>
+<li><a href="#_pluggable_security">12.5. Pluggable Security</a></li>
+<li><a href="#_secure_authorizations_handling">12.6. Secure Authorizations Handling</a></li>
+<li><a href="#_query_services_layer">12.7. Query Services Layer</a></li>
+</ul>
+</li>
+<li><a href="#_replication">13. Replication</a>
+<ul class="sectlevel2">
+<li><a href="#_overview">13.1. Overview</a></li>
+<li><a href="#_configuration_2">13.2. Configuration</a>
+<ul class="sectlevel3">
+<li><a href="#_site_configuration">13.2.1. Site Configuration</a></li>
+<li><a href="#_instance_configuration">13.2.2. Instance Configuration</a></li>
+<li><a href="#_table_configuration_2">13.2.3. Table Configuration</a></li>
+</ul>
+</li>
+<li><a href="#_monitoring">13.3. Monitoring</a></li>
+<li><a href="#_work_assignment">13.4. Work Assignment</a></li>
+<li><a href="#_replicasystems">13.5. ReplicaSystems</a>
+<ul class="sectlevel3">
+<li><a href="#_accumuloreplicasystem">13.5.1. AccumuloReplicaSystem</a></li>
+</ul>
+</li>
+<li><a href="#_other_configuration">13.6. Other Configuration</a></li>
+<li><a href="#_example_practical_configuration">13.7. Example Practical Configuration</a>
+<ul class="sectlevel3">
+<li><a href="#_conf_accumulo_site_xml">13.7.1. conf/accumulo-site.xml</a>
+<ul class="sectlevel4">
+<li><a href="#_primary">Primary</a></li>
+<li><a href="#_peer">Peer</a></li>
+</ul>
+</li>
+<li><a href="#_conf_masters_and_conf_slaves">13.7.2. conf/masters and conf/slaves</a></li>
+<li><a href="#_start_both_instances">13.7.3. Start both instances</a></li>
+<li><a href="#_peer_2">13.7.4. Peer</a></li>
+<li><a href="#_primary_2">13.7.5. Primary</a>
+<ul class="sectlevel4">
+<li><a href="#_set_up_the_table">Set up the table</a></li>
+<li><a href="#_define_the_peer_as_a_replication_peer_to_the_primary">Define the Peer as a replication peer to the Primary</a></li>
+<li><a href="#_set_the_authentication_credentials">Set the authentication credentials</a></li>
+<li><a href="#_enable_replication_on_the_table">Enable replication on the table</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_extra_considerations_for_use">13.8. Extra considerations for use</a>
+<ul class="sectlevel3">
+<li><a href="#_latency">13.8.1. Latency</a></li>
+<li><a href="#_table_configured_iterators">13.8.2. Table-Configured Iterators</a></li>
+<li><a href="#_duplicate_keys">13.8.3. Duplicate Keys</a></li>
+<li><a href="#_bulk_imports">13.8.4. Bulk Imports</a></li>
+</ul>
+</li>
+<li><a href="#_table_schema">13.9. Table Schema</a>
+<ul class="sectlevel3">
+<li><a href="#_repl_section">13.9.1. Repl section</a></li>
+<li><a href="#_work_section">13.9.2. Work section</a></li>
+<li><a href="#_order_section">13.9.3. Order section</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_implementation_details">14. Implementation Details</a>
+<ul class="sectlevel2">
+<li><a href="#_fault_tolerant_executor_fate">14.1. Fault-Tolerant Executor (FATE)</a></li>
+<li><a href="#_overview_2">14.2. Overview</a></li>
+<li><a href="#_administration">14.3. Administration</a>
+<ul class="sectlevel3">
+<li><a href="#_list_print">14.3.1. List/Print</a></li>
+<li><a href="#_fail">14.3.2. Fail</a></li>
+<li><a href="#_delete">14.3.3. Delete</a></li>
+<li><a href="#_dump">14.3.4. Dump</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_ssl">15. SSL</a>
+<ul class="sectlevel2">
+<li><a href="#_server_configuration">15.1. Server configuration</a></li>
+<li><a href="#_client_configuration">15.2. Client configuration</a></li>
+<li><a href="#_generating_ssl_material_using_openssl">15.3. Generating SSL material using OpenSSL</a>
+<ul class="sectlevel3">
+<li><a href="#_generate_a_certificate_authority">15.3.1. Generate a certificate authority</a></li>
+<li><a href="#_generate_a_certificate_keystore_per_host">15.3.2. Generate a certificate/keystore per host</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_kerberos">16. Kerberos</a>
+<ul class="sectlevel2">
+<li><a href="#_overview_3">16.1. Overview</a></li>
+<li><a href="#_within_hadoop">16.2. Within Hadoop</a></li>
+<li><a href="#_delegation_tokens">16.3. Delegation Tokens</a></li>
+<li><a href="#_configuring_accumulo">16.4. Configuring Accumulo</a>
+<ul class="sectlevel3">
+<li><a href="#_servers">16.4.1. Servers</a>
+<ul class="sectlevel4">
+<li><a href="#_generate_principal_and_keytab">Generate Principal and Keytab</a></li>
+<li><a href="#_server_configuration_2">Server Configuration</a></li>
+<li><a href="#_kerberosauthenticator">KerberosAuthenticator</a></li>
+<li><a href="#_administrative_user">Administrative User</a></li>
+<li><a href="#_verifying_secure_access">Verifying secure access</a></li>
+<li><a href="#_impersonation">Impersonation</a></li>
+<li><a href="#_delegation_tokens_2">Delegation Tokens</a></li>
+</ul>
+</li>
+<li><a href="#_clients">16.4.2. Clients</a>
+<ul class="sectlevel4">
+<li><a href="#_create_client_principal">Create client principal</a></li>
+<li><a href="#_configuration_3">Configuration</a></li>
+<li><a href="#_verifying_administrative_access">Verifying Administrative Access</a></li>
+<li><a href="#_delegationtokens_with_mapreduce">DelegationTokens with MapReduce</a></li>
+</ul>
+</li>
+<li><a href="#_debugging">16.4.3. Debugging</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_sampling">17. Sampling</a>
+<ul class="sectlevel2">
+<li><a href="#_overview_4">17.1. Overview</a></li>
+<li><a href="#_configuring">17.2. Configuring</a></li>
+<li><a href="#_scanning_sample_data">17.3. Scanning sample data</a></li>
+<li><a href="#_bulk_import">17.4. Bulk import</a></li>
+</ul>
+</li>
+<li><a href="#_administration_2">18. Administration</a>
+<ul class="sectlevel2">
+<li><a href="#_hardware">18.1. Hardware</a></li>
+<li><a href="#_network">18.2. Network</a></li>
+<li><a href="#_installation">18.3. Installation</a></li>
+<li><a href="#_dependencies">18.4. Dependencies</a></li>
+<li><a href="#_configuration_4">18.5. Configuration</a>
+<ul class="sectlevel3">
+<li><a href="#_edit_conf_accumulo_env_sh">18.5.1. Edit conf/accumulo-env.sh</a></li>
+<li><a href="#_native_map">18.5.2. Native Map</a>
+<ul class="sectlevel4">
+<li><a href="#_building">Building</a></li>
+<li><a href="#_native_maps_configuration">Native Maps Configuration</a></li>
+</ul>
+</li>
+<li><a href="#_cluster_specification">18.5.3. Cluster Specification</a></li>
+<li><a href="#_accumulo_settings">18.5.4. Accumulo Settings</a></li>
+<li><a href="#_hostnames_in_configuration_files">18.5.5. Hostnames in configuration files</a></li>
+<li><a href="#_deploy_configuration">18.5.6. Deploy Configuration</a></li>
+<li><a href="#_sensitive_configuration_values">18.5.7. Sensitive Configuration Values</a></li>
+<li><a href="#_using_a_javakeystorecredentialprovider_for_storage">18.5.8. Using a JavaKeyStoreCredentialProvider for storage</a></li>
+<li><a href="#ClientConfiguration">18.5.9. Client Configuration</a></li>
+<li><a href="#_custom_table_tags">18.5.10. Custom Table Tags</a></li>
+<li><a href="#_configuring_the_classloader">18.5.11. Configuring the ClassLoader</a>
+<ul class="sectlevel4">
+<li><a href="#_classloader_contexts">ClassLoader Contexts</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_initialization">18.6. Initialization</a></li>
+<li><a href="#_running">18.7. Running</a>
+<ul class="sectlevel3">
+<li><a href="#_starting_accumulo">18.7.1. Starting Accumulo</a></li>
+<li><a href="#_stopping_accumulo">18.7.2. Stopping Accumulo</a></li>
+<li><a href="#_adding_a_node">18.7.3. Adding a Node</a></li>
+<li><a href="#_decomissioning_a_node">18.7.4. Decomissioning a Node</a></li>
+<li><a href="#_restarting_process_on_a_node">18.7.5. Restarting process on a node</a>
+<ul class="sectlevel4">
+<li><a href="#_a_note_on_rolling_restarts">A note on rolling restarts</a></li>
+</ul>
+</li>
+<li><a href="#_running_multiple_tabletservers_on_a_single_node">18.7.6. Running multiple TabletServers on a single node</a></li>
+</ul>
+</li>
+<li><a href="#monitoring">18.8. Monitoring</a>
+<ul class="sectlevel3">
+<li><a href="#_accumulo_monitor">18.8.1. Accumulo Monitor</a></li>
+<li><a href="#_ssl_2">18.8.2. SSL</a></li>
+</ul>
+</li>
+<li><a href="#_metrics">18.9. Metrics</a>
+<ul class="sectlevel3">
+<li><a href="#_metrics2_configuration">18.9.1. Metrics2 Configuration</a></li>
+</ul>
+</li>
+<li><a href="#tracing">18.10. Tracing</a>
+<ul class="sectlevel3">
+<li><a href="#_tracers">18.10.1. Tracers</a></li>
+<li><a href="#_configuring_tracing">18.10.2. Configuring Tracing</a>
+<ul class="sectlevel4">
+<li><a href="#_adding_additional_spanreceivers">Adding additional SpanReceivers</a></li>
+</ul>
+</li>
+<li><a href="#_instrumenting_a_client">18.10.3. Instrumenting a Client</a></li>
+<li><a href="#_viewing_collected_traces">18.10.4. Viewing Collected Traces</a>
+<ul class="sectlevel4">
+<li><a href="#_trace_table_format">Trace Table Format</a></li>
+</ul>
+</li>
+<li><a href="#_tracing_from_the_shell">18.10.5. Tracing from the Shell</a></li>
+</ul>
+</li>
+<li><a href="#_logging">18.11. Logging</a></li>
+<li><a href="#watcher">18.12. Watcher</a></li>
+<li><a href="#_recovery">18.13. Recovery</a></li>
+<li><a href="#_migrating_accumulo_from_non_ha_namenode_to_ha_namenode">18.14. Migrating Accumulo from non-HA Namenode to HA Namenode</a></li>
+<li><a href="#_achieving_stability_in_a_vm_environment">18.15. Achieving Stability in a VM Environment</a>
+<ul class="sectlevel3">
+<li><a href="#_known_failure_modes_setup_and_troubleshooting">18.15.1. Known failure modes: Setup and Troubleshooting</a>
+<ul class="sectlevel4">
+<li><a href="#_physical_memory">Physical Memory</a></li>
+<li><a href="#_disk_space">Disk Space</a></li>
+<li><a href="#_zookeeper_interaction">Zookeeper Interaction</a></li>
+</ul>
+</li>
+<li><a href="#_tested_versions">18.15.2. Tested Versions</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#_multi_volume_installations">19. Multi-Volume Installations</a></li>
+<li><a href="#_troubleshooting">20. Troubleshooting</a>
+<ul class="sectlevel2">
+<li><a href="#_logs">20.1. Logs</a></li>
+<li><a href="#_monitor_2">20.2. Monitor</a></li>
+<li><a href="#_hdfs">20.3. HDFS</a></li>
+<li><a href="#_zookeeper">20.4. Zookeeper</a>
+<ul class="sectlevel3">
+<li><a href="#_keeping_the_tablet_server_lock">20.4.1. Keeping the tablet server lock</a></li>
+</ul>
+</li>
+<li><a href="#_tools">20.5. Tools</a></li>
+<li><a href="#metadata">20.6. System Metadata Tables</a></li>
+<li><a href="#_simple_system_recovery">20.7. Simple System Recovery</a></li>
+<li><a href="#_advanced_system_recovery">20.8. Advanced System Recovery</a>
+<ul class="sectlevel3">
+<li><a href="#_hdfs_failure">20.8.1. HDFS Failure</a></li>
+<li><a href="#zookeeper_failure">20.8.2. ZooKeeper Failure</a></li>
+</ul>
+</li>
+<li><a href="#_upgrade_issues">20.9. Upgrade Issues</a></li>
+<li><a href="#_file_naming_conventions">20.10. File Naming Conventions</a></li>
+<li><a href="#_hdfs_decommissioning_issues">20.11. HDFS Decommissioning Issues</a></li>
+</ul>
+</li>
+<li><a href="#configuration">Appendix A: Configuration Management</a>
+<ul class="sectlevel2">
+<li><a href="#_configuration_overview">A.1. Configuration Overview</a>
+<ul class="sectlevel3">
+<li><a href="#_zookeeper_table_properties">A.1.1. Zookeeper table properties</a></li>
+<li><a href="#_zookeeper_system_properties">A.1.2. Zookeeper system properties</a></li>
+<li><a href="#_accumulo_site_xml">A.1.3. accumulo-site.xml</a></li>
+<li><a href="#_default_values">A.1.4. Default Values</a></li>
+<li><a href="#_zookeeper_property_considerations">A.1.5. ZooKeeper Property Considerations</a></li>
+</ul>
+</li>
+<li><a href="#_configuration_in_the_shell">A.2. Configuration in the Shell</a></li>
+<li><a href="#_available_properties">A.3. Available Properties</a>
+<ul class="sectlevel3">
+<li><a href="#RPC_PREFIX">A.3.1. rpc.*</a>
+<ul class="sectlevel4">
+<li><a href="#_rpc_javax_net_ssl_keystore">rpc.javax.net.ssl.keyStore</a></li>
+<li><a href="#_rpc_javax_net_ssl_keystorepassword">rpc.javax.net.ssl.keyStorePassword</a></li>
+<li><a href="#_rpc_javax_net_ssl_keystoretype">rpc.javax.net.ssl.keyStoreType</a></li>
+<li><a href="#_rpc_javax_net_ssl_truststore">rpc.javax.net.ssl.trustStore</a></li>
+<li><a href="#_rpc_javax_net_ssl_truststorepassword">rpc.javax.net.ssl.trustStorePassword</a></li>
+<li><a href="#_rpc_javax_net_ssl_truststoretype">rpc.javax.net.ssl.trustStoreType</a></li>
+<li><a href="#_rpc_sasl_qop">rpc.sasl.qop</a></li>
+<li><a href="#_rpc_ssl_cipher_suites">rpc.ssl.cipher.suites</a></li>
+<li><a href="#_rpc_ssl_client_protocol">rpc.ssl.client.protocol</a></li>
+<li><a href="#_rpc_ssl_server_enabled_protocols">rpc.ssl.server.enabled.protocols</a></li>
+<li><a href="#_rpc_usejsse">rpc.useJsse</a></li>
+</ul>
+</li>
+<li><a href="#INSTANCE_PREFIX">A.3.2. instance.*</a>
+<ul class="sectlevel4">
+<li><a href="#_instance_dfs_dir">instance.dfs.dir</a></li>
+<li><a href="#_instance_dfs_uri">instance.dfs.uri</a></li>
+<li><a href="#_instance_rpc_sasl_allowed_host_impersonation">instance.rpc.sasl.allowed.host.impersonation</a></li>
+<li><a href="#_instance_rpc_sasl_allowed_user_impersonation">instance.rpc.sasl.allowed.user.impersonation</a></li>
+<li><a href="#_instance_rpc_sasl_enabled">instance.rpc.sasl.enabled</a></li>
+<li><a href="#_instance_rpc_ssl_clientauth">instance.rpc.ssl.clientAuth</a></li>
+<li><a href="#_instance_rpc_ssl_enabled">instance.rpc.ssl.enabled</a></li>
+<li><a href="#_instance_secret">instance.secret</a></li>
+<li><a href="#_instance_security_authenticator">instance.security.authenticator</a></li>
+<li><a href="#_instance_security_authorizor">instance.security.authorizor</a></li>
+<li><a href="#_instance_security_permissionhandler">instance.security.permissionHandler</a></li>
+<li><a href="#_instance_volumes">instance.volumes</a></li>
+<li><a href="#_instance_volumes_replacements">instance.volumes.replacements</a></li>
+<li><a href="#_instance_zookeeper_host">instance.zookeeper.host</a></li>
+<li><a href="#_instance_zookeeper_timeout">instance.zookeeper.timeout</a></li>
+</ul>
+</li>
+<li><a href="#INSTANCE_RPC_SASL_PROXYUSERS">A.3.3. instance.rpc.sasl.impersonation.* (Deprecated)</a></li>
+<li><a href="#GENERAL_PREFIX">A.3.4. general.*</a>
+<ul class="sectlevel4">
+<li><a href="#_general_classpaths">general.classpaths</a></li>
+<li><a href="#_general_delegation_token_lifetime">general.delegation.token.lifetime</a></li>
+<li><a href="#_general_delegation_token_update_interval">general.delegation.token.update.interval</a></li>
+<li><a href="#_general_dynamic_classpaths">general.dynamic.classpaths</a></li>
+<li><a href="#_general_kerberos_keytab">general.kerberos.keytab</a></li>
+<li><a href="#_general_kerberos_principal">general.kerberos.principal</a></li>
+<li><a href="#_general_kerberos_renewal_period">general.kerberos.renewal.period</a></li>
+<li><a href="#_general_legacy_metrics">general.legacy.metrics</a></li>
+<li><a href="#_general_max_scanner_retry_period">general.max.scanner.retry.period</a></li>
+<li><a href="#_general_rpc_timeout">general.rpc.timeout</a></li>
+<li><a href="#_general_security_credential_provider_paths">general.security.credential.provider.paths</a></li>
+<li><a href="#_general_server_message_size_max">general.server.message.size.max</a></li>
+<li><a href="#_general_server_simpletimer_threadpool_size">general.server.simpletimer.threadpool.size</a></li>
+<li><a href="#_general_vfs_cache_dir">general.vfs.cache.dir</a></li>
+<li><a href="#_general_vfs_classpaths">general.vfs.classpaths</a></li>
+</ul>
+</li>
+<li><a href="#MASTER_PREFIX">A.3.5. master.*</a>
+<ul class="sectlevel4">
+<li><a href="#_master_bulk_rename_threadpool_size">master.bulk.rename.threadpool.size</a></li>
+<li><a href="#_master_bulk_retries">master.bulk.retries</a></li>
+<li><a href="#_master_bulk_threadpool_size">master.bulk.threadpool.size</a></li>
+<li><a href="#_master_bulk_timeout">master.bulk.timeout</a></li>
+<li><a href="#_master_fate_threadpool_size">master.fate.threadpool.size</a></li>
+<li><a href="#_master_lease_recovery_interval">master.lease.recovery.interval</a></li>
+<li><a href="#_master_metadata_suspendable">master.metadata.suspendable</a></li>
+<li><a href="#_master_port_client">master.port.client</a></li>
+<li><a href="#_master_recovery_delay">master.recovery.delay</a></li>
+<li><a href="#_master_recovery_max_age">master.recovery.max.age</a></li>
+<li><a href="#_master_recovery_time_max">master.recovery.time.max</a></li>
+<li><a href="#_master_replication_coordinator_minthreads">master.replication.coordinator.minthreads</a></li>
+<li><a href="#_master_replication_coordinator_port">master.replication.coordinator.port</a></li>
+<li><a href="#_master_replication_coordinator_threadcheck_time">master.replication.coordinator.threadcheck.time</a></li>
+<li><a href="#_master_replication_status_scan_interval">master.replication.status.scan.interval</a></li>
+<li><a href="#_master_server_threadcheck_time">master.server.threadcheck.time</a></li>
+<li><a href="#_master_server_threads_minimum">master.server.threads.minimum</a></li>
+<li><a href="#_master_status_threadpool_size">master.status.threadpool.size</a></li>
+<li><a href="#_master_tablet_balancer">master.tablet.balancer</a></li>
+<li><a href="#_master_walog_closer_implementation">master.walog.closer.implementation</a></li>
+</ul>
+</li>
+<li><a href="#TSERV_PREFIX">A.3.6. tserver.*</a>
+<ul class="sectlevel4">
+<li><a href="#_tserver_archive_walogs">tserver.archive.walogs</a></li>
+<li><a href="#_tserver_assignment_concurrent_max">tserver.assignment.concurrent.max</a></li>
+<li><a href="#_tserver_assignment_duration_warning">tserver.assignment.duration.warning</a></li>
+<li><a href="#_tserver_bloom_load_concurrent_max">tserver.bloom.load.concurrent.max</a></li>
+<li><a href="#_tserver_bulk_assign_threads">tserver.bulk.assign.threads</a></li>
+<li><a href="#_tserver_bulk_process_threads">tserver.bulk.process.threads</a></li>
+<li><a href="#_tserver_bulk_retry_max">tserver.bulk.retry.max</a></li>
+<li><a href="#_tserver_bulk_timeout">tserver.bulk.timeout</a></li>
+<li><a href="#_tserver_cache_data_size">tserver.cache.data.size</a></li>
+<li><a href="#_tserver_cache_index_size">tserver.cache.index.size</a></li>
+<li><a href="#_tserver_client_timeout">tserver.client.timeout</a></li>
+<li><a href="#_tserver_compaction_major_concurrent_max">tserver.compaction.major.concurrent.max</a></li>
+<li><a href="#_tserver_compaction_major_delay">tserver.compaction.major.delay</a></li>
+<li><a href="#_tserver_compaction_major_thread_files_open_max">tserver.compaction.major.thread.files.open.max</a></li>
+<li><a href="#_tserver_compaction_major_throughput">tserver.compaction.major.throughput</a></li>
+<li><a href="#_tserver_compaction_major_trace_percent">tserver.compaction.major.trace.percent</a></li>
+<li><a href="#_tserver_compaction_minor_concurrent_max">tserver.compaction.minor.concurrent.max</a></li>
+<li><a href="#_tserver_compaction_minor_trace_percent">tserver.compaction.minor.trace.percent</a></li>
+<li><a href="#_tserver_compaction_warn_time">tserver.compaction.warn.time</a></li>
+<li><a href="#_tserver_default_blocksize">tserver.default.blocksize</a></li>
+<li><a href="#_tserver_dir_memdump">tserver.dir.memdump</a></li>
+<li><a href="#_tserver_files_open_idle">tserver.files.open.idle</a></li>
+<li><a href="#_tserver_hold_time_max">tserver.hold.time.max</a></li>
+<li><a href="#_tserver_memory_manager">tserver.memory.manager</a></li>
+<li><a href="#_tserver_memory_maps_max">tserver.memory.maps.max</a></li>
+<li><a href="#_tserver_memory_maps_native_enabled">tserver.memory.maps.native.enabled</a></li>
+<li><a href="#_tserver_metadata_readahead_concurrent_max">tserver.metadata.readahead.concurrent.max</a></li>
+<li><a href="#_tserver_migrations_concurrent_max">tserver.migrations.concurrent.max</a></li>
+<li><a href="#_tserver_monitor_fs">tserver.monitor.fs</a></li>
+<li><a href="#_tserver_mutation_queue_max">tserver.mutation.queue.max</a></li>
+<li><a href="#_tserver_port_client">tserver.port.client</a></li>
+<li><a href="#_tserver_port_search">tserver.port.search</a></li>
+<li><a href="#_tserver_readahead_concurrent_max">tserver.readahead.concurrent.max</a></li>
+<li><a href="#_tserver_recovery_concurrent_max">tserver.recovery.concurrent.max</a></li>
+<li><a href="#_tserver_replication_batchwriter_replayer_memory">tserver.replication.batchwriter.replayer.memory</a></li>
+<li><a href="#_tserver_replication_default_replayer">tserver.replication.default.replayer</a></li>
+<li><a href="#_tserver_scan_files_open_max">tserver.scan.files.open.max</a></li>
+<li><a href="#_tserver_server_message_size_max">tserver.server.message.size.max</a></li>
+<li><a href="#_tserver_server_threadcheck_time">tserver.server.threadcheck.time</a></li>
+<li><a href="#_tserver_server_threads_minimum">tserver.server.threads.minimum</a></li>
+<li><a href="#_tserver_session_idle_max">tserver.session.idle.max</a></li>
+<li><a href="#_tserver_session_update_idle_max">tserver.session.update.idle.max</a></li>
+<li><a href="#_tserver_slow_flush_time">tserver.slow.flush.time</a></li>
+<li><a href="#_tserver_sort_buffer_size">tserver.sort.buffer.size</a></li>
+<li><a href="#_tserver_tablet_split_midpoint_files_max">tserver.tablet.split.midpoint.files.max</a></li>
+<li><a href="#_tserver_total_mutation_queue_max">tserver.total.mutation.queue.max</a></li>
+<li><a href="#_tserver_wal_blocksize">tserver.wal.blocksize</a></li>
+<li><a href="#_tserver_wal_replication">tserver.wal.replication</a></li>
+<li><a href="#_tserver_wal_sync">tserver.wal.sync</a></li>
+<li><a href="#_tserver_wal_sync_method">tserver.wal.sync.method</a></li>
+<li><a href="#_tserver_walog_max_age">tserver.walog.max.age</a></li>
+<li><a href="#_tserver_walog_max_size">tserver.walog.max.size</a></li>
+<li><a href="#_tserver_walog_maximum_wait_duration">tserver.walog.maximum.wait.duration</a></li>
+<li><a href="#_tserver_walog_tolerated_creation_failures">tserver.walog.tolerated.creation.failures</a></li>
+<li><a href="#_tserver_walog_tolerated_wait_increment">tserver.walog.tolerated.wait.increment</a></li>
+<li><a href="#_tserver_workq_threads">tserver.workq.threads</a></li>
+</ul>
+</li>
+<li><a href="#TSERV_REPLICATION_REPLAYERS">A.3.7. tserver.replication.replayer.*</a></li>
+<li><a href="#GC_PREFIX">A.3.8. gc.*</a>
+<ul class="sectlevel4">
+<li><a href="#_gc_cycle_delay">gc.cycle.delay</a></li>
+<li><a href="#_gc_cycle_start">gc.cycle.start</a></li>
+<li><a href="#_gc_file_archive">gc.file.archive</a></li>
+<li><a href="#_gc_port_client">gc.port.client</a></li>
+<li><a href="#_gc_threads_delete">gc.threads.delete</a></li>
+<li><a href="#_gc_trace_percent">gc.trace.percent</a></li>
+<li><a href="#_gc_trash_ignore">gc.trash.ignore</a></li>
+</ul>
+</li>
+<li><a href="#MONITOR_PREFIX">A.3.9. monitor.*</a>
+<ul class="sectlevel4">
+<li><a href="#_monitor_banner_background">monitor.banner.background</a></li>
+<li><a href="#_monitor_banner_color">monitor.banner.color</a></li>
+<li><a href="#_monitor_banner_text">monitor.banner.text</a></li>
+<li><a href="#_monitor_lock_check_interval">monitor.lock.check.interval</a></li>
+<li><a href="#_monitor_log_date_format">monitor.log.date.format</a></li>
+<li><a href="#_monitor_port_client">monitor.port.client</a></li>
+<li><a href="#_monitor_port_log4j">monitor.port.log4j</a></li>
+<li><a href="#_monitor_ssl_exclude_ciphers">monitor.ssl.exclude.ciphers</a></li>
+<li><a href="#_monitor_ssl_include_ciphers">monitor.ssl.include.ciphers</a></li>
+<li><a href="#_monitor_ssl_include_protocols">monitor.ssl.include.protocols</a></li>
+<li><a href="#_monitor_ssl_keystore">monitor.ssl.keyStore</a></li>
+<li><a href="#_monitor_ssl_keystorepassword">monitor.ssl.keyStorePassword</a></li>
+<li><a href="#_monitor_ssl_keystoretype">monitor.ssl.keyStoreType</a></li>
+<li><a href="#_monitor_ssl_truststore">monitor.ssl.trustStore</a></li>
+<li><a href="#_monitor_ssl_truststorepassword">monitor.ssl.trustStorePassword</a></li>
+<li><a href="#_monitor_ssl_truststoretype">monitor.ssl.trustStoreType</a></li>
+</ul>
+</li>
+<li><a href="#TRACE_PREFIX">A.3.10. trace.*</a>
+<ul class="sectlevel4">
+<li><a href="#_trace_password">trace.password</a></li>
+<li><a href="#_trace_port_client">trace.port.client</a></li>
+<li><a href="#_trace_span_receivers">trace.span.receivers</a></li>
+<li><a href="#_trace_table">trace.table</a></li>
+<li><a href="#_trace_token_type">trace.token.type</a></li>
+<li><a href="#_trace_user">trace.user</a></li>
+<li><a href="#_trace_zookeeper_path">trace.zookeeper.path</a></li>
+</ul>
+</li>
+<li><a href="#TRACE_SPAN_RECEIVER_PREFIX">A.3.11. trace.span.receiver.*</a></li>
+<li><a href="#TRACE_TOKEN_PROPERTY_PREFIX">A.3.12. trace.token.property.*</a></li>
+<li><a href="#TABLE_PREFIX">A.3.13. table.*</a>
+<ul class="sectlevel4">
+<li><a href="#_table_balancer">table.balancer</a></li>
+<li><a href="#_table_bloom_enabled">table.bloom.enabled</a></li>
+<li><a href="#_table_bloom_error_rate">table.bloom.error.rate</a></li>
+<li><a href="#_table_bloom_hash_type">table.bloom.hash.type</a></li>
+<li><a href="#_table_bloom_key_functor">table.bloom.key.functor</a></li>
+<li><a href="#_table_bloom_load_threshold">table.bloom.load.threshold</a></li>
+<li><a href="#_table_bloom_size">table.bloom.size</a></li>
+<li><a href="#_table_cache_block_enable">table.cache.block.enable</a></li>
+<li><a href="#_table_cache_index_enable">table.cache.index.enable</a></li>
+<li><a href="#_table_classpath_context">table.classpath.context</a></li>
+<li><a href="#_table_compaction_major_everything_idle">table.compaction.major.everything.idle</a></li>
+<li><a href="#_table_compaction_major_ratio">table.compaction.major.ratio</a></li>
+<li><a href="#_table_compaction_minor_idle">table.compaction.minor.idle</a></li>
+<li><a href="#_table_compaction_minor_logs_threshold">table.compaction.minor.logs.threshold</a></li>
+<li><a href="#_table_compaction_minor_merge_file_size_max">table.compaction.minor.merge.file.size.max</a></li>
+<li><a href="#_table_durability">table.durability</a></li>
+<li><a href="#_table_failures_ignore">table.failures.ignore</a></li>
+<li><a href="#_table_file_blocksize">table.file.blocksize</a></li>
+<li><a href="#_table_file_compress_blocksize">table.file.compress.blocksize</a></li>
+<li><a href="#_table_file_compress_blocksize_index">table.file.compress.blocksize.index</a></li>
+<li><a href="#_table_file_compress_type">table.file.compress.type</a></li>
+<li><a href="#_table_file_max">table.file.max</a></li>
+<li><a href="#_table_file_replication">table.file.replication</a></li>
+<li><a href="#_table_file_type">table.file.type</a></li>
+<li><a href="#_table_formatter">table.formatter</a></li>
+<li><a href="#_table_groups_enabled">table.groups.enabled</a></li>
+<li><a href="#_table_interepreter">table.interepreter</a></li>
+<li><a href="#_table_majc_compaction_strategy">table.majc.compaction.strategy</a></li>
+<li><a href="#_table_replication">table.replication</a></li>
+<li><a href="#_table_sampler">table.sampler</a></li>
+<li><a href="#_table_scan_max_memory">table.scan.max.memory</a></li>
+<li><a href="#_table_security_scan_visibility_default">table.security.scan.visibility.default</a></li>
+<li><a href="#_table_split_endrow_size_max">table.split.endrow.size.max</a></li>
+<li><a href="#_table_split_threshold">table.split.threshold</a></li>
+<li><a href="#_table_suspend_duration">table.suspend.duration</a></li>
+<li><a href="#_table_walog_enabled">table.walog.enabled</a></li>
+</ul>
+</li>
+<li><a href="#TABLE_ARBITRARY_PROP_PREFIX">A.3.14. table.custom.*</a></li>
+<li><a href="#TABLE_CONSTRAINT_PREFIX">A.3.15. table.constraint.*</a></li>
+<li><a href="#TABLE_ITERATOR_PREFIX">A.3.16. table.iterator.*</a></li>
+<li><a href="#TABLE_ITERATOR_SCAN_PREFIX">A.3.17. table.iterator.scan.*</a></li>
+<li><a href="#TABLE_ITERATOR_MINC_PREFIX">A.3.18. table.iterator.minc.*</a></li>
+<li><a href="#TABLE_ITERATOR_MAJC_PREFIX">A.3.19. table.iterator.majc.*</a></li>
+<li><a href="#TABLE_LOCALITY_GROUP_PREFIX">A.3.20. table.group.*</a></li>
+<li><a href="#TABLE_COMPACTION_STRATEGY_PREFIX">A.3.21. table.majc.compaction.strategy.opts.*</a></li>
+<li><a href="#TABLE_REPLICATION_TARGET">A.3.22. table.replication.target.*</a></li>
+<li><a href="#TABLE_SAMPLER_OPTS">A.3.23. table.sampler.opt.*</a></li>
+<li><a href="#VFS_CONTEXT_CLASSPATH_PROPERTY">A.3.24. general.vfs.context.classpath.*</a></li>
+<li><a href="#REPLICATION_PREFIX">A.3.25. replication.*</a>
+<ul class="sectlevel4">
+<li><a href="#_replication_driver_delay">replication.driver.delay</a></li>
+<li><a href="#_replication_max_unit_size">replication.max.unit.size</a></li>
+<li><a href="#_replication_max_work_queue">replication.max.work.queue</a></li>
+<li><a href="#_replication_name">replication.name</a></li>
+<li><a href="#_replication_receipt_service_port">replication.receipt.service.port</a></li>
+<li><a href="#_replication_receiver_min_threads">replication.receiver.min.threads</a></li>
+<li><a href="#_replication_receiver_threadcheck_time">replication.receiver.threadcheck.time</a></li>
+<li><a href="#_replication_rpc_timeout">replication.rpc.timeout</a></li>
+<li><a href="#_replication_trace_percent">replication.trace.percent</a></li>
+<li><a href="#_replication_work_assigner">replication.work.assigner</a></li>
+<li><a href="#_replication_work_assignment_sleep">replication.work.assignment.sleep</a></li>
+<li><a href="#_replication_work_attempts">replication.work.attempts</a></li>
+<li><a href="#_replication_work_processor_delay">replication.work.processor.delay</a></li>
+<li><a href="#_replication_work_processor_period">replication.work.processor.period</a></li>
+<li><a href="#_replication_worker_threads">replication.worker.threads</a></li>
+</ul>
+</li>
+<li><a href="#REPLICATION_PEERS">A.3.26. replication.peer.*</a></li>
+<li><a href="#REPLICATION_PEER_USER">A.3.27. replication.peer.user.*</a></li>
+<li><a href="#REPLICATION_PEER_PASSWORD">A.3.28. replication.peer.password.*</a></li>
+<li><a href="#REPLICATION_PEER_KEYTAB">A.3.29. replication.peer.keytab.*</a></li>
+</ul>
+</li>
+<li><a href="#_property_types">A.4. Property Types</a>
+<ul class="sectlevel3">
+<li><a href="#_duration">A.4.1. duration</a></li>
+<li><a href="#_memory">A.4.2. memory</a></li>
+<li><a href="#_host_list">A.4.3. host list</a></li>
+<li><a href="#_port">A.4.4. port</a></li>
+<li><a href="#_count">A.4.5. count</a></li>
+<li><a href="#_fraction_percentage">A.4.6. fraction/percentage</a></li>
+<li><a href="#_path">A.4.7. path</a></li>
+<li><a href="#_absolute_path">A.4.8. absolute path</a></li>
+<li><a href="#_java_class">A.4.9. java class</a></li>
+<li><a href="#_java_class_list">A.4.10. java class list</a></li>
+<li><a href="#_durability_2">A.4.11. durability</a></li>
+<li><a href="#_string">A.4.12. string</a></li>
+<li><a href="#_boolean">A.4.13. boolean</a></li>
+<li><a href="#_uri">A.4.14. uri</a></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+</div>
+<div id="content">
+<div id="preamble">
+<div class="sectionbody">
+<div class="imageblock">
+<div class="content">
+<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAWgAAABcCAYAAABDeS/aAAACzmlDQ1BJQ0MgUHJvZmlsZQAAeNqNk8trFFkUh7/q3KCQIAy0r14Ml1lIkCSUDzQiPtJJbKKxbcpEkyBIp/p2d5mb6ppb1XEUEcnGpc4we/GxcOEf4MKFK90oEXwhiHsVRRTcqLSL6nRX8HlWX/3Oub9zzi0udNrFINApCXN+ZJxcVk5OTcsVz0ixni4ydBXdMBgsFMYAikGg+SY+PsECeNj3/fxPo6sUunNgrYTU+5IKXej4DNQqk1PTIDSQPhkFEYhzQNrE+v9Aeibm60DajDtDIG4Bq9zARCDuAQNutViCTgH0VhI1Mwme03W3Oc8fQLfyJw4DGyB1VoUjTbYWSsXhA0A/WK9KangE6AXretnbNwr0AM/LZt9EzNZGLxodjzl1xNf5sSav82fyh5qeIoiyzpJ/ [...]
+</div>
+</div>
+<div class="paragraph">
+<p>Copyright © 2011-2017 The Apache Software Foundation, Licensed under the Apache
+License, Version 2.0.  Apache Accumulo, Accumulo, Apache, and the Apache
+Accumulo project logo are trademarks of the Apache Software Foundation.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_introduction">1. Introduction</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Apache Accumulo is a highly scalable structured store based on Google&#8217;s BigTable.
+Accumulo is written in Java and operates over the Hadoop Distributed File System
+(HDFS), which is part of the popular Apache Hadoop project. Accumulo supports
+efficient storage and retrieval of structured data, including queries for ranges, and
+provides support for using Accumulo tables as input and output for MapReduce
+jobs.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo features automatic load-balancing and partitioning, data compression
+and fine-grained security labels.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_accumulo_design">2. Accumulo Design</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_data_model">2.1. Data Model</h3>
+<div class="paragraph">
+<p>Accumulo provides a richer data model than simple key-value stores, but is not a
+fully relational database. Data is represented as key-value pairs, where the key and
+value are comprised of the following elements:</p>
+</div>
+<table class="tableblock frame-all grid-all" style="width: 75%;">
+<colgroup>
+<col style="width: 16.6666%;">
+<col style="width: 16.6666%;">
+<col style="width: 16.6666%;">
+<col style="width: 16.6666%;">
+<col style="width: 16.6666%;">
+<col style="width: 16.667%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-center valign-top" colspan="5"><p class="tableblock">Key</p></td>
+<td class="tableblock halign-center valign-middle" rowspan="3"><p class="tableblock">Value</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-middle" rowspan="2"><p class="tableblock">Row ID</p></td>
+<td class="tableblock halign-center valign-top" colspan="3"><p class="tableblock">Column</p></td>
+<td class="tableblock halign-center valign-middle" rowspan="2"><p class="tableblock">Timestamp</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Family</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Qualifier</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Visibility</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>All elements of the Key and the Value are represented as byte arrays except for
+Timestamp, which is a Long. Accumulo sorts keys by element and lexicographically
+in ascending order. Timestamps are sorted in descending order so that later
+versions of the same Key appear first in a sequential scan. Tables consist of a set of
+sorted key-value pairs.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_architecture">2.2. Architecture</h3>
+<div class="paragraph">
+<p>Accumulo is a distributed data storage and retrieval system and as such consists of
+several architectural components, some of which run on many individual servers.
+Much of the work Accumulo does involves maintaining certain properties of the
+data, such as organization, availability, and integrity, across many commodity-class
+machines.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_components">2.3. Components</h3>
+<div class="paragraph">
+<p>An instance of Accumulo includes many TabletServers, one Garbage Collector process,
+one Master server and many Clients.</p>
+</div>
+<div class="sect3">
+<h4 id="_tablet_server">2.3.1. Tablet Server</h4>
+<div class="paragraph">
+<p>The TabletServer manages some subset of all the tablets (partitions of tables). This includes receiving writes from clients, persisting writes to a
+write-ahead log, sorting new key-value pairs in memory, periodically
+flushing sorted key-value pairs to new files in HDFS, and responding
+to reads from clients, forming a merge-sorted view of all keys and
+values from all the files it has created and the sorted in-memory
+store.</p>
+</div>
+<div class="paragraph">
+<p>TabletServers also perform recovery of a tablet
+that was previously on a server that failed, reapplying any writes
+found in the write-ahead log to the tablet.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_garbage_collector">2.3.2. Garbage Collector</h4>
+<div class="paragraph">
+<p>Accumulo processes will share files stored in HDFS. Periodically, the Garbage
+Collector will identify files that are no longer needed by any process, and
+delete them. Multiple garbage collectors can be run to provide hot-standby support.
+They will perform leader election among themselves to choose a single active instance.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_master">2.3.3. Master</h4>
+<div class="paragraph">
+<p>The Accumulo Master is responsible for detecting and responding to TabletServer
+failure. It tries to balance the load across TabletServer by assigning tablets carefully
+and instructing TabletServers to unload tablets when necessary. The Master ensures all
+tablets are assigned to one TabletServer each, and handles table creation, alteration,
+and deletion requests from clients. The Master also coordinates startup, graceful
+shutdown and recovery of changes in write-ahead logs when Tablet servers fail.</p>
+</div>
+<div class="paragraph">
+<p>Multiple masters may be run. The masters will choose among themselves a single master,
+and the others will become backups if the master should fail.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_tracer">2.3.4. Tracer</h4>
+<div class="paragraph">
+<p>The Accumulo Tracer process supports the distributed timing API provided by Accumulo.
+One to many of these processes can be run on a cluster which will write the timing
+information to a given Accumulo table for future reference. Seeing the section on
+Tracing for more information on this support.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_monitor">2.3.5. Monitor</h4>
+<div class="paragraph">
+<p>The Accumulo Monitor is a web application that provides a wealth of information about
+the state of an instance. The Monitor shows graphs and tables which contain information
+about read/write rates, cache hit/miss rates, and Accumulo table information such as scan
+rate and active/queued compactions. Additionally, the Monitor should always be the first
+point of entry when attempting to debug an Accumulo problem as it will show high-level problems
+in addition to aggregated errors from all nodes in the cluster. See the section on <a href="#monitoring">Monitoring</a>
+for more information.</p>
+</div>
+<div class="paragraph">
+<p>Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the
+forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active
+at one time. Leader election will be performed internally to choose the active Monitor.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_client">2.3.6. Client</h4>
+<div class="paragraph">
+<p>Accumulo includes a client library that is linked to every application. The client
+library contains logic for finding servers managing a particular tablet, and
+communicating with TabletServers to write and retrieve key-value pairs.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_data_management">2.4. Data Management</h3>
+<div class="paragraph">
+<p>Accumulo stores data in tables, which are partitioned into tablets. Tablets are
+partitioned on row boundaries so that all of the columns and values for a particular
+row are found together within the same tablet. The Master assigns Tablets to one
+TabletServer at a time. This enables row-level transactions to take place without
+using distributed locking or some other complicated synchronization mechanism. As
+clients insert and query data, and as machines are added and removed from the
+cluster, the Master migrates tablets to ensure they remain available and that the
+ingest and query load is balanced across the cluster.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAygAAAJFCAYAAAA26dPBAAAC7mlDQ1BJQ0MgUHJvZmlsZQAAeAGFVM9rE0EU/jZuqdAiCFprDrJ4kCJJWatoRdQ2/RFiawzbH7ZFkGQzSdZuNuvuJrWliOTi0SreRe2hB/+AHnrwZC9KhVpFKN6rKGKhFy3xzW5MtqXqwM5+8943731vdt8ADXLSNPWABOQNx1KiEWlsfEJq/IgAjqIJQTQlVdvsTiQGQYNz+Xvn2HoPgVtWw3v7d7J3rZrStpoHhP1A4Eea2Sqw7xdxClkSAog836Epx3QI3+PY8uyPOU55eMG1Dys9xFkifEA1Lc5/TbhTzSXTQINIOJT1cVI+nNeLlNcdB2luZsbIEL1PkKa7zO6rYqGcTvYOkL2d9H5Os94+wiHCCxmtP0a4jZ71jNU/4mHhpObEhj0cGDX0+GAVtxqp+DXC [...]
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_tablet_service">2.5. Tablet Service</h3>
+<div class="paragraph">
+<p>When a write arrives at a TabletServer it is written to a Write-Ahead Log and
+then inserted into a sorted data structure in memory called a MemTable. When the
+MemTable reaches a certain size, the TabletServer writes out the sorted
+key-value pairs to a file in HDFS called a Relative Key File (RFile), which is a
+kind of Indexed Sequential Access Method (ISAM) file. This process is called a
+minor compaction. A new MemTable is then created and the fact of the compaction
+is recorded in the Write-Ahead Log.</p>
+</div>
+<div class="paragraph">
+<p>When a request to read data arrives at a TabletServer, the TabletServer does a
+binary search across the MemTable as well as the in-memory indexes associated
+with each RFile to find the relevant values. If clients are performing a scan,
+several key-value pairs are returned to the client in order from the MemTable
+and the set of RFiles by performing a merge-sort as they are read.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_compactions">2.6. Compactions</h3>
+<div class="paragraph">
+<p>In order to manage the number of files per tablet, periodically the TabletServer
+performs Major Compactions of files within a tablet, in which some set of RFiles
+are combined into one file. The previous files will eventually be removed by the
+Garbage Collector. This also provides an opportunity to permanently remove
+deleted key-value pairs by omitting key-value pairs suppressed by a delete entry
+when the new file is created.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_splitting">2.7. Splitting</h3>
+<div class="paragraph">
+<p>When a table is created it has one tablet. As the table grows its initial
+tablet eventually splits into two tablets. Its likely that one of these
+tablets will migrate to another tablet server. As the table continues to grow,
+its tablets will continue to split and be migrated. The decision to
+automatically split a tablet is based on the size of a tablets files. The
+size threshold at which a tablet splits is configurable per table. In addition
+to automatic splitting, a user can manually add split points to a table to
+create new tablets. Manually splitting a new table can parallelize reads and
+writes giving better initial performance without waiting for automatic
+splitting.</p>
+</div>
+<div class="paragraph">
+<p>As data is deleted from a table, tablets may shrink. Over time this can lead
+to small or empty tablets. To deal with this, merging of tablets was
+introduced in Accumulo 1.4. This is discussed in more detail later.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_fault_tolerance">2.8. Fault-Tolerance</h3>
+<div class="paragraph">
+<p>If a TabletServer fails, the Master detects it and automatically reassigns the tablets
+assigned from the failed server to other servers. Any key-value pairs that were in
+memory at the time the TabletServer fails are automatically reapplied from the Write-Ahead
+Log(WAL) to prevent any loss of data.</p>
+</div>
+<div class="paragraph">
+<p>Tablet servers write their WALs directly to HDFS so the logs are available to all tablet
+servers for recovery. To make the recovery process efficient, the updates within a log are
+grouped by tablet.  TabletServers can quickly apply the mutations from the sorted logs
+that are destined for the tablets they have now been assigned.</p>
+</div>
+<div class="paragraph">
+<p>TabletServer failures are noted on the Master&#8217;s monitor page, accessible via
+<code><a href="http://master-address:9995/monitor" class="bare">http://master-address:9995/monitor</a></code>.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAyoAAAJuCAIAAADQFWFeAAAgAElEQVR4nOydB1gU19eHd6mCIjawGysERVQUrLH3joXEgoUWe41YMcYWQ6yxG4xd7MQudsVgxN41YFds8Y+KiqLCd9y7zjeZLQ7L7mzh9z734dm9U/aew87cd2fu3pVlAAAAAAAACZEZuwEAAAAAANkL6BcAAAAAgKRAvwAAAAAAJAX6BQAAAAAgKdAvAAAAAABJgX4BAAAAAEgK9AsAAAAAQFKgXwAAAAAAkgL9AgAAAACQFOgXAAAAAICkQL8AAAAAACQF+gUAAAAAICnQLwAAAAAASYF+AQAAAABICvQLAAAAAEBSoF8AAAAAAJKii35FRUVN/czPP//87t07vTcre3L58mUusStXrjTQqyxfvnyqOGbOnJmVF1q8eDG3q1u3bn2x3ixISEjgp+j9+/dqV4uL [...]
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_accumulo_shell">3. Accumulo Shell</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accumulo provides a simple shell that can be used to examine the contents and
+configuration settings of tables, insert/update/delete values, and change
+configuration settings.</p>
+</div>
+<div class="paragraph">
+<p>The shell can be started by the following command:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/bin/accumulo shell -u [username]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The shell will prompt for the corresponding password to the username specified
+and then display the following prompt:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Shell - Apache Accumulo Interactive Shell
+-
+- version 1.6
+- instance name: myinstance
+- instance id: 00000000-0000-0000-0000-000000000000
+-
+- type 'help' for a list of available commands
+-</pre>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_basic_administration">3.1. Basic Administration</h3>
+<div class="paragraph">
+<p>The Accumulo shell can be used to create and delete tables, as well as to configure
+table and instance specific options.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@myinstance&gt; tables
+accumulo.metadata
+accumulo.root
+
+root@myinstance&gt; createtable mytable
+
+root@myinstance mytable&gt;
+
+root@myinstance mytable&gt; tables
+accumulo.metadata
+accumulo.root
+mytable
+
+root@myinstance mytable&gt; createtable testtable
+
+root@myinstance testtable&gt;
+
+root@myinstance testtable&gt; deletetable testtable
+deletetable { testtable } (yes|no)? yes
+Table: [testtable] has been deleted.
+
+root@myinstance&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The Shell can also be used to insert updates and scan tables. This is useful for
+inspecting tables.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@myinstance mytable&gt; scan
+
+root@myinstance mytable&gt; insert row1 colf colq value1
+insert successful
+
+root@myinstance mytable&gt; scan
+row1 colf:colq [] value1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The value in brackets &#8220;[]&#8221; would be the visibility labels. Since none were used, this is empty for this row.
+You can use the <code>-st</code> option to scan to see the timestamp for the cell, too.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_table_maintenance">3.2. Table Maintenance</h3>
+<div class="paragraph">
+<p>The <strong>compact</strong> command instructs Accumulo to schedule a compaction of the table during which
+files are consolidated and deleted entries are removed.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance mytable&gt; compact -t mytable
+07 16:13:53,201 [shell.Shell] INFO : Compaction of table mytable started for given range</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The <strong>flush</strong> command instructs Accumulo to write all entries currently in memory for a given table
+to disk.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance mytable&gt; flush -t mytable
+07 16:14:19,351 [shell.Shell] INFO : Flush of table mytable
+initiated...</pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_user_administration">3.3. User Administration</h3>
+<div class="paragraph">
+<p>The Shell can be used to add, remove, and grant privileges to users.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@myinstance mytable&gt; createuser bob
+Enter new password for 'bob': *********
+Please confirm new password for 'bob': *********
+
+root@myinstance mytable&gt; authenticate bob
+Enter current password for 'bob': *********
+Valid
+
+root@myinstance mytable&gt; grant System.CREATE_TABLE -s -u bob
+
+root@myinstance mytable&gt; user bob
+Enter current password for 'bob': *********
+
+bob@myinstance mytable&gt; userpermissions
+System permissions: System.CREATE_TABLE
+Table permissions (accumulo.metadata): Table.READ
+Table permissions (mytable): NONE
+
+bob@myinstance mytable&gt; createtable bobstable
+
+bob@myinstance bobstable&gt;
+
+bob@myinstance bobstable&gt; user root
+Enter current password for 'root': *********
+
+root@myinstance bobstable&gt; revoke System.CREATE_TABLE -s -u bob</pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_jsr_223_support_in_the_shell">3.4. JSR-223 Support in the Shell</h3>
+<div class="paragraph">
+<p>The script command can be used to invoke programs written in languages supported by installed JSR-223
+engines. You can get a list of installed engines with the -l argument. Below is an example of the output
+of the command when running the Shell with Java 7.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@fake&gt; script -l
+    Engine Alias: ECMAScript
+    Engine Alias: JavaScript
+    Engine Alias: ecmascript
+    Engine Alias: javascript
+    Engine Alias: js
+    Engine Alias: rhino
+    Language: ECMAScript (1.8)
+    Script Engine: Mozilla Rhino (1.7 release 3 PRERELEASE)
+ScriptEngineFactory Info</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre> A list of compatible languages can be found at https://en.wikipedia.org/wiki/List_of_JVM_languages. The
+rhino javascript engine is provided with the JVM. Typically putting a jar on the classpath is all that is
+needed to install a new engine.</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre> When writing scripts to run in the shell, you will have a variable called connection already available
+to you. This variable is a reference to an Accumulo Connector object, the same connection that the Shell
+is using to communicate with the Accumulo servers. At this point you can use any of the public API methods
+within your script. Reference the script command help to see all of the execution options. Script and script
+invocation examples can be found in ACCUMULO-1399.</pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_writing_accumulo_clients">4. Writing Accumulo Clients</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_running_client_code">4.1. Running Client Code</h3>
+<div class="paragraph">
+<p>There are multiple ways to run Java code that uses Accumulo. Below is a list
+of the different ways to execute client code.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>using java executable</p>
+</li>
+<li>
+<p>using the accumulo script</p>
+</li>
+<li>
+<p>using the tool script</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>In order to run client code written to run against Accumulo, you will need to
+include the jars that Accumulo depends on in your classpath. Accumulo client
+code depends on Hadoop and Zookeeper. For Hadoop add the hadoop client jar, all
+of the jars in the Hadoop lib directory, and the conf directory to the
+classpath. For recent Zookeeper versions, you only need to add the Zookeeper jar, and not
+what is in the Zookeeper lib directory. You can run the following command on a
+configured Accumulo system to see what its using for its classpath.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/bin/accumulo classpath</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Another option for running your code is to put a jar file in
+<code>$ACCUMULO_HOME/lib/ext</code>. After doing this you can use the accumulo
+script to execute your code. For example if you create a jar containing the
+class <code>com.foo.Client</code> and placed that in <code>lib/ext</code>, then you could use the command
+<code>$ACCUMULO_HOME/bin/accumulo com.foo.Client</code> to execute your code.</p>
+</div>
+<div class="paragraph">
+<p>If you are writing map reduce job that access Accumulo, then you can use the
+bin/tool.sh script to run those jobs. See the map reduce example.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_connecting">4.2. Connecting</h3>
+<div class="paragraph">
+<p>All clients must first identify the Accumulo instance to which they will be
+communicating. Code to do this is as follows:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">String instanceName = "myinstance";
+String zooServers = "zooserver-one,zooserver-two"
+Instance inst = new ZooKeeperInstance(instanceName, zooServers);
+
+Connector conn = inst.getConnector("user", new PasswordToken("passwd"));</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The PasswordToken is the most common implementation of an <code>AuthenticationToken</code>.
+This general interface allow authentication as an Accumulo user to come from
+a variety of sources or means. The CredentialProviderToken leverages the Hadoop
+CredentialProviders (new in Hadoop 2.6).</p>
+</div>
+<div class="paragraph">
+<p>For example, the CredentialProviderToken can be used in conjunction with a Java
+KeyStore to alleviate passwords stored in cleartext. When stored in HDFS, a single
+KeyStore can be used across an entire instance. Be aware that KeyStores stored on
+the local filesystem must be made available to all nodes in the Accumulo cluster.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">KerberosToken token = new KerberosToken();
+Connector conn = inst.getConnector(token.getPrincipal(), token);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The KerberosToken can be provided to use the authentication provided by Kerberos.
+Using Kerberos requires external setup and additional configuration, but provides
+a single point of authentication through HDFS, YARN and ZooKeeper and allowing
+for password-less authentication with Accumulo.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_writing_data">4.3. Writing Data</h3>
+<div class="paragraph">
+<p>Data are written to Accumulo by creating Mutation objects that represent all the
+changes to the columns of a single row. The changes are made atomically in the
+TabletServer. Clients then add Mutations to a BatchWriter which submits them to
+the appropriate TabletServers.</p>
+</div>
+<div class="paragraph">
+<p>Mutations can be created thus:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Text rowID = new Text("row1");
+Text colFam = new Text("myColFam");
+Text colQual = new Text("myColQual");
+ColumnVisibility colVis = new ColumnVisibility("public");
+long timestamp = System.currentTimeMillis();
+
+Value value = new Value("myValue".getBytes());
+
+Mutation mutation = new Mutation(rowID);
+mutation.put(colFam, colQual, colVis, timestamp, value);</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_batchwriter">4.3.1. BatchWriter</h4>
+<div class="paragraph">
+<p>The BatchWriter is highly optimized to send Mutations to multiple TabletServers
+and automatically batches Mutations destined for the same TabletServer to
+amortize network overhead. Care must be taken to avoid changing the contents of
+any Object passed to the BatchWriter since it keeps objects in memory while
+batching.</p>
+</div>
+<div class="paragraph">
+<p>Mutations are added to a BatchWriter thus:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">// BatchWriterConfig has reasonable defaults
+BatchWriterConfig config = new BatchWriterConfig();
+config.setMaxMemory(10000000L); // bytes available to batchwriter for buffering mutations
+
+BatchWriter writer = conn.createBatchWriter("table", config)
+
+writer.addMutation(mutation);
+
+writer.close();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An example of using the batch writer can be found at
+<code>accumulo/docs/examples/README.batch</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_conditionalwriter">4.3.2. ConditionalWriter</h4>
+<div class="paragraph">
+<p>The ConditionalWriter enables efficient, atomic read-modify-write operations on
+rows.  The ConditionalWriter writes special Mutations which have a list of per
+column conditions that must all be met before the mutation is applied.  The
+conditions are checked in the tablet server while a row lock is
+held (Mutations written by the BatchWriter will not obtain a row
+lock).  The conditions that can be checked for a column are equality and
+absence.  For example a conditional mutation can require that column A is
+absent inorder to be applied.  Iterators can be applied when checking
+conditions.  Using iterators, many other operations besides equality and
+absence can be checked.  For example, using an iterator that converts values
+less than 5 to 0 and everything else to 1, its possible to only apply a
+mutation when a column is less than 5.</p>
+</div>
+<div class="paragraph">
+<p>In the case when a tablet server dies after a client sent a conditional
+mutation, its not known if the mutation was applied or not.  When this happens
+the ConditionalWriter reports a status of UNKNOWN for the ConditionalMutation.
+In many cases this situation can be dealt with by simply reading the row again
+and possibly sending another conditional mutation.  If this is not sufficient,
+then a higher level of abstraction can be built by storing transactional
+information within a row.</p>
+</div>
+<div class="paragraph">
+<p>An example of using the batch writer can be found at
+<code>accumulo/docs/examples/README.reservations</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_durability">4.3.3. Durability</h4>
+<div class="paragraph">
+<p>By default, Accumulo writes out any updates to the Write-Ahead Log (WAL). Every change
+goes into a file in HDFS and is sync&#8217;d to disk for maximum durability. In
+the event of a failure, writes held in memory are replayed from the WAL. Like
+all files in HDFS, this file is also replicated. Sending updates to the
+replicas, and waiting for a permanent sync to disk can significantly write speeds.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo allows users to use less tolerant forms of durability when writing.
+These levels are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>none: no durability guarantees are made, the WAL is not used</p>
+</li>
+<li>
+<p>log: the WAL is used, but not flushed; loss of the server probably means recent writes are lost</p>
+</li>
+<li>
+<p>flush: updates are written to the WAL, and flushed out to replicas; loss of a single server is unlikely to result in data loss.</p>
+</li>
+<li>
+<p>sync: updates are written to the WAL, and synced to disk on all replicas before the write is acknowledge. Data will not be lost even if the entire cluster suddenly loses power.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The user can set the default durability of a table in the shell.  When
+writing, the user can configure the BatchWriter or ConditionalWriter to use
+a different level of durability for the session. This will override the
+default durability setting.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">BatchWriterConfig cfg = new BatchWriterConfig();
+// We don't care about data loss with these writes:
+// This is DANGEROUS:
+cfg.setDurability(Durability.NONE);
+
+Connection conn = ... ;
+BatchWriter bw = conn.createBatchWriter(table, cfg);</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_reading_data">4.4. Reading Data</h3>
+<div class="paragraph">
+<p>Accumulo is optimized to quickly retrieve the value associated with a given key, and
+to efficiently return ranges of consecutive keys and their associated values.</p>
+</div>
+<div class="sect3">
+<h4 id="_scanner">4.4.1. Scanner</h4>
+<div class="paragraph">
+<p>To retrieve data, Clients use a Scanner, which acts like an Iterator over
+keys and values. Scanners can be configured to start and stop at particular keys, and
+to return a subset of the columns available.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">// specify which visibilities we are allowed to see
+Authorizations auths = new Authorizations("public");
+
+Scanner scan =
+    conn.createScanner("table", auths);
+
+scan.setRange(new Range("harry","john"));
+scan.fetchColumnFamily(new Text("attributes"));
+
+for(Entry&lt;Key,Value&gt; entry : scan) {
+    Text row = entry.getKey().getRow();
+    Value value = entry.getValue();
+}</code></pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_isolated_scanner">4.4.2. Isolated Scanner</h4>
+<div class="paragraph">
+<p>Accumulo supports the ability to present an isolated view of rows when
+scanning. There are three possible ways that a row could change in Accumulo :</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>a mutation applied to a table</p>
+</li>
+<li>
+<p>iterators executed as part of a minor or major compaction</p>
+</li>
+<li>
+<p>bulk import of new files</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Isolation guarantees that either all or none of the changes made by these
+operations on a row are seen. Use the IsolatedScanner to obtain an isolated
+view of an Accumulo table. When using the regular scanner it is possible to see
+a non isolated view of a row. For example if a mutation modifies three
+columns, it is possible that you will only see two of those modifications.
+With the isolated scanner either all three of the changes are seen or none.</p>
+</div>
+<div class="paragraph">
+<p>The IsolatedScanner buffers rows on the client side so a large row will not
+crash a tablet server. By default rows are buffered in memory, but the user
+can easily supply their own buffer if they wish to buffer to disk when rows are
+large.</p>
+</div>
+<div class="paragraph">
+<p>For an example, look at the following</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>examples/simple/src/main/java/org/apache/accumulo/examples/simple/isolation/InterferenceTest.java</pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_batchscanner">4.4.3. BatchScanner</h4>
+<div class="paragraph">
+<p>For some types of access, it is more efficient to retrieve several ranges
+simultaneously. This arises when accessing a set of rows that are not consecutive
+whose IDs have been retrieved from a secondary index, for example.</p>
+</div>
+<div class="paragraph">
+<p>The BatchScanner is configured similarly to the Scanner; it can be configured to
+retrieve a subset of the columns available, but rather than passing a single Range,
+BatchScanners accept a set of Ranges. It is important to note that the keys returned
+by a BatchScanner are not in sorted order since the keys streamed are from multiple
+TabletServers in parallel.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">ArrayList&lt;Range&gt; ranges = new ArrayList&lt;Range&gt;();
+// populate list of ranges ...
+
+BatchScanner bscan =
+    conn.createBatchScanner("table", auths, 10);
+bscan.setRanges(ranges);
+bscan.fetchColumnFamily("attributes");
+
+for(Entry&lt;Key,Value&gt; entry : bscan) {
+    System.out.println(entry.getValue());
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An example of the BatchScanner can be found at
+<code>accumulo/docs/examples/README.batch</code>.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_proxy">4.5. Proxy</h3>
+<div class="paragraph">
+<p>The proxy API allows the interaction with Accumulo with languages other than Java.
+A proxy server is provided in the codebase and a client can further be generated.
+The proxy API can also be used instead of the traditional ZooKeeperInstance class to
+provide a single TCP port in which clients can be securely routed through a firewall,
+without requiring access to all tablet servers in the cluster.</p>
+</div>
+<div class="sect3">
+<h4 id="_prerequisites">4.5.1. Prerequisites</h4>
+<div class="paragraph">
+<p>The proxy server can live on any node in which the basic client API would work. That
+means it must be able to communicate with the Master, ZooKeepers, NameNode, and the
+DataNodes. A proxy client only needs the ability to communicate with the proxy server.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_configuration">4.5.2. Configuration</h4>
+<div class="paragraph">
+<p>The configuration options for the proxy server live inside of a properties file. At
+the very least, you need to supply the following properties:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
+tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken
+port=42424
+instance=test
+zookeepers=localhost:2181</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>You can find a sample configuration file in your distribution:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/proxy/proxy.properties.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This sample configuration file further demonstrates an ability to back the proxy server
+by MockAccumulo or the MiniAccumuloCluster.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_running_the_proxy_server">4.5.3. Running the Proxy Server</h4>
+<div class="paragraph">
+<p>After the properties file holding the configuration is created, the proxy server
+can be started using the following command in the Accumulo distribution (assuming
+your properties file is named <code>config.properties</code>):</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/bin/accumulo proxy -p config.properties</pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_creating_a_proxy_client">4.5.4. Creating a Proxy Client</h4>
+<div class="paragraph">
+<p>Aside from installing the Thrift compiler, you will also need the language-specific library
+for Thrift installed to generate client code in that language. Typically, your operating
+system&#8217;s package manager will be able to automatically install these for you in an expected
+location such as <code>/usr/lib/python/site-packages/thrift</code>.</p>
+</div>
+<div class="paragraph">
+<p>You can find the thrift file for generating the client:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/proxy/proxy.thrift.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>After a client is generated, the port specified in the configuration properties above will be
+used to connect to the server.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_using_a_proxy_client">4.5.5. Using a Proxy Client</h4>
+<div class="paragraph">
+<p>The following examples have been written in Java and the method signatures may be
+slightly different depending on the language specified when generating client with
+the Thrift compiler. After initiating a connection to the Proxy (see Apache Thrift&#8217;s
+documentation for examples of connecting to a Thrift service), the methods on the
+proxy client will be available. The first thing to do is log in:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Map password = new HashMap&lt;String,String&gt;();
+password.put("password", "secret");
+ByteBuffer token = client.login("root", password);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Once logged in, the token returned will be used for most subsequent calls to the client.
+Let&#8217;s create a table, add some data, scan the table, and delete it.</p>
+</div>
+<div class="paragraph">
+<p>First, create a table.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">client.createTable(token, "myTable", true, TimeType.MILLIS);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Next, add some data:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">// first, create a writer on the server
+String writer = client.createWriter(token, "myTable", new WriterOptions());
+
+//rowid
+ByteBuffer rowid = ByteBuffer.wrap("UUID".getBytes());
+
+//mutation like class
+ColumnUpdate cu = new ColumnUpdate();
+cu.setColFamily("MyFamily".getBytes());
+cu.setColQualifier("MyQualifier".getBytes());
+cu.setColVisibility("VisLabel".getBytes());
+cu.setValue("Some Value.".getBytes());
+
+List&lt;ColumnUpdate&gt; updates = new ArrayList&lt;ColumnUpdate&gt;();
+updates.add(cu);
+
+// build column updates
+Map&lt;ByteBuffer, List&lt;ColumnUpdate&gt;&gt; cellsToUpdate = new HashMap&lt;ByteBuffer, List&lt;ColumnUpdate&gt;&gt;();
+cellsToUpdate.put(rowid, updates);
+
+// send updates to the server
+client.updateAndFlush(writer, "myTable", cellsToUpdate);
+
+client.closeWriter(writer);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Scan for the data and batch the return of the results on the server:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">String scanner = client.createScanner(token, "myTable", new ScanOptions());
+ScanResult results = client.nextK(scanner, 100);
+
+for(KeyValue keyValue : results.getResultsIterator()) {
+  // do something with results
+}
+
+client.closeScanner(scanner);</code></pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_development_clients">5. Development Clients</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Normally, Accumulo consists of lots of moving parts. Even a stand-alone version of
+Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, etc. If
+you want to write a unit test that uses Accumulo, you need a lot of infrastructure
+in place before your test can run.</p>
+</div>
+<div class="sect2">
+<h3 id="_mock_accumulo">5.1. Mock Accumulo</h3>
+<div class="paragraph">
+<p>Mock Accumulo supplies mock implementations for much of the client API. It presently
+does not enforce users, logins, permissions, etc. It does support Iterators and Combiners.
+Note that MockAccumulo holds all data in memory, and will not retain any data or
+settings between runs.</p>
+</div>
+<div class="paragraph">
+<p>While normal interaction with the Accumulo client looks like this:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Instance instance = new ZooKeeperInstance(...);
+Connector conn = instance.getConnector(user, passwordToken);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To interact with the MockAccumulo, just replace the ZooKeeperInstance with MockInstance:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Instance instance = new MockInstance();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>In fact, you can use the <code>--fake</code> option to the Accumulo shell and interact with
+MockAccumulo:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./bin/accumulo shell --fake -u root -p ''
+
+Shell - Apache Accumulo Interactive Shell
+-
+- version: 1.6
+- instance name: fake
+- instance id: mock-instance-id
+-
+- type 'help' for a list of available commands
+-
+
+root@fake&gt; createtable test
+
+root@fake test&gt; insert row1 cf cq value
+root@fake test&gt; insert row2 cf cq value2
+root@fake test&gt; insert row3 cf cq value3
+
+root@fake test&gt; scan
+row1 cf:cq []    value
+row2 cf:cq []    value2
+row3 cf:cq []    value3
+
+root@fake test&gt; scan -b row2 -e row2
+row2 cf:cq []    value2
+
+root@fake test&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When testing Map Reduce jobs, you can also set the Mock Accumulo on the AccumuloInputFormat
+and AccumuloOutputFormat classes:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">AccumuloInputFormat.setMockInstance(job, "mockInstance");
+AccumuloOutputFormat.setMockInstance(job, "mockInstance");</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_mini_accumulo_cluster">5.2. Mini Accumulo Cluster</h3>
+<div class="paragraph">
+<p>While the Mock Accumulo provides a lightweight implementation of the client API for unit
+testing, it is often necessary to write more realistic end-to-end integration tests that
+take advantage of the entire ecosystem. The Mini Accumulo Cluster makes this possible by
+configuring and starting Zookeeper, initializing Accumulo, and starting the Master as well
+as some Tablet Servers. It runs against the local filesystem instead of having to start
+up HDFS.</p>
+</div>
+<div class="paragraph">
+<p>To start it up, you will need to supply an empty directory and a root password as arguments:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">File tempDirectory = // JUnit and Guava supply mechanisms for creating temp directories
+MiniAccumuloCluster accumulo = new MiniAccumuloCluster(tempDirectory, "password");
+accumulo.start();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Once we have our mini cluster running, we will want to interact with the Accumulo client API:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Instance instance = new ZooKeeperInstance(accumulo.getInstanceName(), accumulo.getZooKeepers());
+Connector conn = instance.getConnector("root", new PasswordToken("password"));</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Upon completion of our development code, we will want to shutdown our MiniAccumuloCluster:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">accumulo.stop();
+// delete your temporary folder</code></pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_table_configuration">6. Table Configuration</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accumulo tables have a few options that can be configured to alter the default
+behavior of Accumulo as well as improve performance based on the data stored.
+These include locality groups, constraints, bloom filters, iterators, and block
+cache.  For a complete list of available configuration options, see <a href="#configuration">Configuration Management</a>.</p>
+</div>
+<div class="sect2">
+<h3 id="_locality_groups">6.1. Locality Groups</h3>
+<div class="paragraph">
+<p>Accumulo supports storing sets of column families separately on disk to allow
+clients to efficiently scan over columns that are frequently used together and to avoid
+scanning over column families that are not requested. After a locality group is set,
+Scanner and BatchScanner operations will automatically take advantage of them
+whenever the fetchColumnFamilies() method is used.</p>
+</div>
+<div class="paragraph">
+<p>By default, tables place all column families into the same &#8220;default&#8221; locality group.
+Additional locality groups can be configured at any time via the shell or
+programmatically as follows:</p>
+</div>
+<div class="sect3">
+<h4 id="_managing_locality_groups_via_the_shell">6.1.1. Managing Locality Groups via the Shell</h4>
+<div class="literalblock">
+<div class="content">
+<pre>usage: setgroups &lt;group&gt;=&lt;col fam&gt;{,&lt;col fam&gt;}{ &lt;group&gt;=&lt;col fam&gt;{,&lt;col fam&gt;}}
+    [-?] -t &lt;table&gt;</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; setgroups group_one=colf1,colf2 -t mytable</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; getgroups -t mytable</pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_managing_locality_groups_via_the_client_api">6.1.2. Managing Locality Groups via the Client API</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Connector conn;
+
+HashMap&lt;String,Set&lt;Text&gt;&gt; localityGroups = new HashMap&lt;String, Set&lt;Text&gt;&gt;();
+
+HashSet&lt;Text&gt; metadataColumns = new HashSet&lt;Text&gt;();
+metadataColumns.add(new Text("domain"));
+metadataColumns.add(new Text("link"));
+
+HashSet&lt;Text&gt; contentColumns = new HashSet&lt;Text&gt;();
+contentColumns.add(new Text("body"));
+contentColumns.add(new Text("images"));
+
+localityGroups.put("metadata", metadataColumns);
+localityGroups.put("content", contentColumns);
+
+conn.tableOperations().setLocalityGroups("mytable", localityGroups);
+
+// existing locality groups can be obtained as follows
+Map&lt;String, Set&lt;Text&gt;&gt; groups =
+    conn.tableOperations().getLocalityGroups("mytable");</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The assignment of Column Families to Locality Groups can be changed at any time. The
+physical movement of column families into their new locality groups takes place via
+the periodic Major Compaction process that takes place continuously in the
+background. Major Compaction can also be scheduled to take place immediately
+through the shell:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; compact -t mytable</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_constraints">6.2. Constraints</h3>
+<div class="paragraph">
+<p>Accumulo supports constraints applied on mutations at insert time. This can be
+used to disallow certain inserts according to a user defined policy. Any mutation
+that fails to meet the requirements of the constraint is rejected and sent back to the
+client.</p>
+</div>
+<div class="paragraph">
+<p>Constraints can be enabled by setting a table property as follows:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; constraint -t mytable -a com.test.ExampleConstraint com.test.AnotherConstraint
+
+user@myinstance mytable&gt; constraint -l
+com.test.ExampleConstraint=1
+com.test.AnotherConstraint=2</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Currently there are no general-purpose constraints provided with the Accumulo
+distribution. New constraints can be created by writing a Java class that implements
+the following interface:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>org.apache.accumulo.core.constraints.Constraint</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To deploy a new constraint, create a jar file containing the class implementing the
+new constraint and place it in the lib directory of the Accumulo installation. New
+constraint jars can be added to Accumulo and enabled without restarting but any
+change to an existing constraint class requires Accumulo to be restarted.</p>
+</div>
+<div class="paragraph">
+<p>An example of constraints can be found in
+<code>accumulo/docs/examples/README.constraints</code> with corresponding code under
+<code>accumulo/examples/simple/src/main/java/accumulo/examples/simple/constraints</code> .</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_bloom_filters">6.3. Bloom Filters</h3>
+<div class="paragraph">
+<p>As mutations are applied to an Accumulo table, several files are created per tablet. If
+bloom filters are enabled, Accumulo will create and load a small data structure into
+memory to determine whether a file contains a given key before opening the file.
+This can speed up lookups considerably.</p>
+</div>
+<div class="paragraph">
+<p>To enable bloom filters, enter the following command in the Shell:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance&gt; config -t mytable -s table.bloom.enabled=true</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An extensive example of using Bloom Filters can be found at
+<code>accumulo/docs/examples/README.bloom</code> .</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_iterators">6.4. Iterators</h3>
+<div class="paragraph">
+<p>Iterators provide a modular mechanism for adding functionality to be executed by
+TabletServers when scanning or compacting data. This allows users to efficiently
+summarize, filter, and aggregate data. In fact, the built-in features of cell-level
+security and column fetching are implemented using Iterators.
+Some useful Iterators are provided with Accumulo and can be found in the
+<strong><code>org.apache.accumulo.core.iterators.user</code></strong> package.
+In each case, any custom Iterators must be included in Accumulo&#8217;s classpath,
+typically by including a jar in <code>$ACCUMULO_HOME/lib</code> or
+<code>$ACCUMULO_HOME/lib/ext</code>, although the VFS classloader allows for
+classpath manipulation using a variety of schemes including URLs and HDFS URIs.</p>
+</div>
+<div class="sect3">
+<h4 id="_setting_iterators_via_the_shell">6.4.1. Setting Iterators via the Shell</h4>
+<div class="paragraph">
+<p>Iterators can be configured on a table at scan, minor compaction and/or major
+compaction scopes. If the Iterator implements the OptionDescriber interface, the
+setiter command can be used which will interactively prompt the user to provide
+values for the given necessary options.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>usage: setiter [-?] -ageoff | -agg | -class &lt;name&gt; | -regex |
+    -reqvis | -vers   [-majc] [-minc] [-n &lt;itername&gt;] -p &lt;pri&gt;
+    [-scan] [-t &lt;table&gt;]</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; setiter -t mytable -scan -p 15 -n myiter -class com.company.MyIterator</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The config command can always be used to manually configure iterators which is useful
+in cases where the Iterator does not implement the OptionDescriber interface.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>config -t mytable -s table.iterator.scan.myiter=15,com.company.MyIterator
+config -t mytable -s table.iterator.minc.myiter=15,com.company.MyIterator
+config -t mytable -s table.iterator.majc.myiter=15,com.company.MyIterator
+config -t mytable -s table.iterator.scan.myiter.opt.myoptionname=myoptionvalue
+config -t mytable -s table.iterator.minc.myiter.opt.myoptionname=myoptionvalue
+config -t mytable -s table.iterator.majc.myiter.opt.myoptionname=myoptionvalue</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Typically, a table will have multiple iterators. Accumulo configures a set of
+system level iterators for each table. These iterators provide core
+functionality like visibility label filtering and may not be removed by
+users. User level iterators are applied in the order of their priority.
+Priority is a user configured integer; iterators with lower numbers go first,
+passing the results of their iteration on to the other iterators up the
+stack.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_setting_iterators_programmatically">6.4.2. Setting Iterators Programmatically</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">scanner.addIterator(new IteratorSetting(
+    15, // priority
+    "myiter", // name this iterator
+    "com.company.MyIterator" // class name
+));</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Some iterators take additional parameters from client code, as in the following
+example:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">IteratorSetting iter = new IteratorSetting(...);
+iter.addOption("myoptionname", "myoptionvalue");
+scanner.addIterator(iter)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Tables support separate Iterator settings to be applied at scan time, upon minor
+compaction and upon major compaction. For most uses, tables will have identical
+iterator settings for all three to avoid inconsistent results.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_versioning_iterators_and_timestamps">6.4.3. Versioning Iterators and Timestamps</h4>
+<div class="paragraph">
+<p>Accumulo provides the capability to manage versioned data through the use of
+timestamps within the Key. If a timestamp is not specified in the key created by the
+client then the system will set the timestamp to the current time. Two keys with
+identical rowIDs and columns but different timestamps are considered two versions
+of the same key. If two inserts are made into Accumulo with the same rowID,
+column, and timestamp, then the behavior is non-deterministic.</p>
+</div>
+<div class="paragraph">
+<p>Timestamps are sorted in descending order, so the most recent data comes first.
+Accumulo can be configured to return the top k versions, or versions later than a
+given date. The default is to return the one most recent version.</p>
+</div>
+<div class="paragraph">
+<p>The version policy can be changed by changing the VersioningIterator options for a
+table as follows:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; config -t mytable -s table.iterator.scan.vers.opt.maxVersions=3
+
+user@myinstance mytable&gt; config -t mytable -s table.iterator.minc.vers.opt.maxVersions=3
+
+user@myinstance mytable&gt; config -t mytable -s table.iterator.majc.vers.opt.maxVersions=3</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When a table is created, by default its configured to use the
+VersioningIterator and keep one version. A table can be created without the
+VersioningIterator with the -ndi option in the shell. Also the Java API
+has the following method</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">connector.tableOperations.create(String tableName, boolean limitVersion);</code></pre>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_logical_time">Logical Time</h5>
+<div class="paragraph">
+<p>Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
+set by Accumulo always move forward. This helps avoid problems caused by
+TabletServers that have different time settings. The per tablet counter gives unique
+one up time stamps on a per mutation basis. When using time in milliseconds, if
+two things arrive within the same millisecond then both receive the same
+timestamp. When using time in milliseconds, Accumulo set times will still
+always move forward and never backwards.</p>
+</div>
+<div class="paragraph">
+<p>A table can be configured to use logical timestamps at creation time as follows:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance&gt; createtable -tl logical</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_deletes">Deletes</h5>
+<div class="paragraph">
+<p>Deletes are special keys in Accumulo that get sorted along will all the other data.
+When a delete key is inserted, Accumulo will not show anything that has a
+timestamp less than or equal to the delete key. During major compaction, any keys
+older than a delete key are omitted from the new file created, and the omitted keys
+are removed from disk as part of the regular garbage collection process.</p>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_filters">6.4.4. Filters</h4>
+<div class="paragraph">
+<p>When scanning over a set of key-value pairs it is possible to apply an arbitrary
+filtering policy through the use of a Filter. Filters are types of iterators that return
+only key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters
+that can be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be added
+by writing a Java class that extends the
+<code>org.apache.accumulo.core.iterators.Filter</code> class.</p>
+</div>
+<div class="paragraph">
+<p>The AgeOff filter can be configured to remove data older than a certain date or a fixed
+amount of time from the present. The following example sets a table to delete
+everything inserted over 30 seconds ago:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>user@myinstance&gt; createtable filtertest
+
+user@myinstance filtertest&gt; setiter -t filtertest -scan -minc -majc -p 10 -n myfilter -ageoff
+AgeOffFilter removes entries with timestamps more than &lt;ttl&gt; milliseconds old
+----------&gt; set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter negate, default false
+                keeps k/v that pass accept method, true rejects k/v that pass accept method:
+----------&gt; set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter ttl, time to
+                live (milliseconds): 30000
+----------&gt; set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter currentTime, if set,
+                use the given value as the absolute time in milliseconds as the current time of day:
+
+user@myinstance filtertest&gt;
+
+user@myinstance filtertest&gt; scan
+
+user@myinstance filtertest&gt; insert foo a b c
+
+user@myinstance filtertest&gt; scan
+foo a:b [] c
+
+user@myinstance filtertest&gt; sleep 4
+
+user@myinstance filtertest&gt; scan
+
+user@myinstance filtertest&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To see the iterator settings for a table, use:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@example filtertest&gt; config -t filtertest -f iterator
+---------+---------------------------------------------+------------------
+SCOPE    | NAME                                        | VALUE
+---------+---------------------------------------------+------------------
+table    | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+table    | table.iterator.majc.myfilter.opt.ttl ...... | 30000
+table    | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.majc.vers.opt.maxVersions .. | 1
+table    | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+table    | table.iterator.minc.myfilter.opt.ttl ...... | 30000
+table    | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.minc.vers.opt.maxVersions .. | 1
+table    | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+table    | table.iterator.scan.myfilter.opt.ttl ...... | 30000
+table    | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.scan.vers.opt.maxVersions .. | 1
+---------+---------------------------------------------+------------------</pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_combiners">6.4.5. Combiners</h4>
+<div class="paragraph">
+<p>Accumulo allows Combiners to be configured on tables and column
+families. When a Combiner is set it is applied across the values
+associated with any keys that share rowID, column family, and column qualifier.
+This is similar to the reduce step in MapReduce, which applied some function to all
+the values associated with a particular key.</p>
+</div>
+<div class="paragraph">
+<p>For example, if a summing combiner were configured on a table and the following
+mutations were inserted:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Row     Family Qualifier Timestamp  Value
+rowID1  colfA  colqA     20100101   1
+rowID1  colfA  colqA     20100102   1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The table would reflect only one aggregate value:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>rowID1  colfA  colqA     -          2</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Combiners can be enabled for a table using the setiter command in the shell. Below is an example.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@a14 perDayCounts&gt; setiter -t perDayCounts -p 10 -scan -minc -majc -n daycount
+                       -class org.apache.accumulo.core.iterators.user.SummingCombiner
+TypedValueCombiner can interpret Values as a variety of number encodings
+  (VLong, Long, or String) before combining
+----------&gt; set SummingCombiner parameter columns,
+            &lt;col fam&gt;[:&lt;col qual&gt;]{,&lt;col fam&gt;[:&lt;col qual&gt;]} : day
+----------&gt; set SummingCombiner parameter type, &lt;VARNUM|LONG|STRING&gt;: STRING
+
+root@a14 perDayCounts&gt; insert foo day 20080101 1
+root@a14 perDayCounts&gt; insert foo day 20080101 1
+root@a14 perDayCounts&gt; insert foo day 20080103 1
+root@a14 perDayCounts&gt; insert bar day 20080101 1
+root@a14 perDayCounts&gt; insert bar day 20080101 1
+
+root@a14 perDayCounts&gt; scan
+bar day:20080101 []    2
+foo day:20080101 []    2
+foo day:20080103 []    1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Accumulo includes some useful Combiners out of the box. To find these look in
+the <strong><code>org.apache.accumulo.core.iterators.user</code></strong> package.</p>
+</div>
+<div class="paragraph">
+<p>Additional Combiners can be added by creating a Java class that extends
+<code>org.apache.accumulo.core.iterators.Combiner</code> and adding a jar containing that
+class to Accumulo&#8217;s lib/ext directory.</p>
+</div>
+<div class="paragraph">
+<p>An example of a Combiner can be found under</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>accumulo/examples/simple/src/main/java/org/apache/accumulo/examples/simple/combiner/StatsCombiner.java</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_block_cache">6.5. Block Cache</h3>
+<div class="paragraph">
+<p>In order to increase throughput of commonly accessed entries, Accumulo employs a block cache.
+This block cache buffers data in memory so that it doesn&#8217;t have to be read off of disk.
+The RFile format that Accumulo prefers is a mix of index blocks and data blocks, where the index blocks are used to find the appropriate data blocks.
+Typical queries to Accumulo result in a binary search over several index blocks followed by a linear scan of one or more data blocks.</p>
+</div>
+<div class="paragraph">
+<p>The block cache can be configured on a per-table basis, and all tablets hosted on a tablet server share a single resource pool.
+To configure the size of the tablet server&#8217;s block cache, set the following properties:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>tserver.cache.data.size: Specifies the size of the cache for file data blocks.
+tserver.cache.index.size: Specifies the size of the cache for file indices.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To enable the block cache for your table, set the following properties:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>table.cache.block.enable: Determines whether file (data) block cache is enabled.
+table.cache.index.enable: Determines whether index cache is enabled.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency.
+It is enabled by default for the metadata tables.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_compaction">6.6. Compaction</h3>
+<div class="paragraph">
+<p>As data is written to Accumulo it is buffered in memory. The data buffered in
+memory is eventually written to HDFS on a per tablet basis. Files can also be
+added to tablets directly by bulk import. In the background tablet servers run
+major compactions to merge multiple files into one. The tablet server has to
+decide which tablets to compact and which files within a tablet to compact.
+This decision is made using the compaction ratio, which is configurable on a
+per table basis. To configure this ratio modify the following property:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>table.compaction.major.ratio</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Increasing this ratio will result in more files per tablet and less compaction
+work. More files per tablet means more higher query latency. So adjusting
+this ratio is a trade off between ingest and query performance. The ratio
+defaults to 3.</p>
+</div>
+<div class="paragraph">
+<p>The way the ratio works is that a set of files is compacted into one file if the
+sum of the sizes of the files in the set is larger than the ratio multiplied by
+the size of the largest file in the set. If this is not true for the set of all
+files in a tablet, the largest file is removed from consideration, and the
+remaining files are considered for compaction. This is repeated until a
+compaction is triggered or there are no files left to consider.</p>
+</div>
+<div class="paragraph">
+<p>The number of background threads tablet servers use to run major compactions is
+configurable. To configure this modify the following property:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>tserver.compaction.major.concurrent.max</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Also, the number of threads tablet servers use for minor compactions is
+configurable. To configure this modify the following property:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>tserver.compaction.minor.concurrent.max</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The numbers of minor and major compactions running and queued is visible on the
+Accumulo monitor page. This allows you to see if compactions are backing up
+and adjustments to the above settings are needed. When adjusting the number of
+threads available for compactions, consider the number of cores and other tasks
+running on the nodes such as maps and reduces.</p>
+</div>
+<div class="paragraph">
+<p>If major compactions are not keeping up, then the number of files per tablet
+will grow to a point such that query performance starts to suffer. One way to
+handle this situation is to increase the compaction ratio. For example, if the
+compaction ratio were set to 1, then every new file added to a tablet by minor
+compaction would immediately queue the tablet for major compaction. So if a
+tablet has a 200M file and minor compaction writes a 1M file, then the major
+compaction will attempt to merge the 200M and 1M file. If the tablet server
+has lots of tablets trying to do this sort of thing, then major compactions
+will back up and the number of files per tablet will start to grow, assuming
+data is being continuously written. Increasing the compaction ratio will
+alleviate backups by lowering the amount of major compaction work that needs to
+be done.</p>
+</div>
+<div class="paragraph">
+<p>Another option to deal with the files per tablet growing too large is to adjust
+the following property:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>table.file.max</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When a tablet reaches this number of files and needs to flush its in-memory
+data to disk, it will choose to do a merging minor compaction. A merging minor
+compaction will merge the tablet&#8217;s smallest file with the data in memory at
+minor compaction time. Therefore the number of files will not grow beyond this
+limit. This will make minor compactions take longer, which will cause ingest
+performance to decrease. This can cause ingest to slow down until major
+compactions have enough time to catch up. When adjusting this property, also
+consider adjusting the compaction ratio. Ideally, merging minor compactions
+never need to occur and major compactions will keep up. It is possible to
+configure the file max and compaction ratio such that only merging minor
+compactions occur and major compactions never occur. This should be avoided
+because doing only merging minor compactions causes O(<em>N</em><sup>2</sup>) work to be done.
+The amount of work done by major compactions is O(<em>N</em>*log<sub><em>R</em></sub>(<em>N</em>)) where
+<em>R</em> is the compaction ratio.</p>
+</div>
+<div class="paragraph">
+<p>Compactions can be initiated manually for a table. To initiate a minor
+compaction, use the flush command in the shell. To initiate a major compaction,
+use the compact command in the shell. The compact command will compact all
+tablets in a table to one file. Even tablets with one file are compacted. This
+is useful for the case where a major compaction filter is configured for a
+table. In 1.4 the ability to compact a range of a table was added. To use this
+feature specify start and stop rows for the compact command. This will only
+compact tablets that overlap the given row range.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_pre_splitting_tables">6.7. Pre-splitting tables</h3>
+<div class="paragraph">
+<p>Accumulo will balance and distribute tables across servers. Before a
+table gets large, it will be maintained as a single tablet on a single
+server. This limits the speed at which data can be added or queried
+to the speed of a single node. To improve performance when the a table
+is new, or small, you can add split points and generate new tablets.</p>
+</div>
+<div class="paragraph">
+<p>In the shell:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; createtable newTable
+root@myinstance&gt; addsplits -t newTable g n t</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This will create a new table with 4 tablets. The table will be split
+on the letters &#8220;g&#8221;, &#8220;n&#8221;, and &#8220;t&#8221; which will work nicely if the
+row data start with lower-case alphabetic characters. If your row
+data includes binary information or numeric information, or if the
+distribution of the row information is not flat, then you would pick
+different split points. Now ingest and query can proceed on 4 nodes
+which can improve performance.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_merging_tablets">6.8. Merging tablets</h3>
+<div class="paragraph">
+<p>Over time, a table can get very large, so large that it has hundreds
+of thousands of split points. Once there are enough tablets to spread
+a table across the entire cluster, additional splits may not improve
+performance, and may create unnecessary bookkeeping. The distribution
+of data may change over time. For example, if row data contains date
+information, and data is continually added and removed to maintain a
+window of current information, tablets for older rows may be empty.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo supports tablet merging, which can be used to reduce
+the number of split points. The following command will merge all rows
+from &#8220;A&#8221; to &#8220;Z&#8221; into a single tablet:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; merge -t myTable -s A -e Z</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If the result of a merge produces a tablet that is larger than the
+configured split size, the tablet may be split by the tablet server.
+Be sure to increase your tablet size prior to any merges if the goal
+is to have larger tablets:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; config -t myTable -s table.split.threshold=2G</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>In order to merge small tablets, you can ask Accumulo to merge
+sections of a table smaller than a given size.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; merge -t myTable -s 100M</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>By default, small tablets will not be merged into tablets that are
+already larger than the given size. This can leave isolated small
+tablets. To force small tablets to be merged into larger tablets use
+the <code>--force</code> option:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; merge -t myTable -s 100M --force</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Merging away small tablets works on one section at a time. If your
+table contains many sections of small split points, or you are
+attempting to change the split size of the entire table, it will be
+faster to set the split point and merge the entire table:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; config -t myTable -s table.split.threshold=256M
+root@myinstance&gt; merge -t myTable</pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_delete_range">6.9. Delete Range</h3>
+<div class="paragraph">
+<p>Consider an indexing scheme that uses date information in each row.
+For example &#8220;20110823-15:20:25.013&#8221; might be a row that specifies a
+date and time. In some cases, we might like to delete rows based on
+this date, say to remove all the data older than the current year.
+Accumulo supports a delete range operation which efficiently
+removes data between two rows. For example:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; deleterange -t myTable -s 2010 -e 2011</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This will delete all rows starting with &#8220;2010&#8221; and it will stop at
+any row starting &#8220;2011&#8221;. You can delete any data prior to 2011
+with:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>root@myinstance&gt; deleterange -t myTable -e 2011 --force</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The shell will not allow you to delete an unbounded range (no start)
+unless you provide the <code>--force</code> option.</p>
+</div>
+<div class="paragraph">
+<p>Range deletion is implemented using splits at the given start/end
+positions, and will affect the number of splits in the table.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_cloning_tables">6.10. Cloning Tables</h3>
+<div class="paragraph">
+<p>A new table can be created that points to an existing table&#8217;s data. This is a
+very quick metadata operation, no data is actually copied. The cloned table
+and the source table can change independently after the clone operation. One
+use case for this feature is testing. For example to test a new filtering
+iterator, clone the table, add the filter to the clone, and force a major
+compaction. To perform a test on less data, clone a table and then use delete
+range to efficiently remove a lot of data from the clone. Another use case is
+generating a snapshot to guard against human error. To create a snapshot,
+clone a table and then disable write permissions on the clone.</p>
+</div>
+<div class="paragraph">
+<p>The clone operation will point to the source table&#8217;s files. This is why the
+flush option is present and is enabled by default in the shell. If the flush
+option is not enabled, then any data the source table currently has in memory
+will not exist in the clone.</p>
+</div>
+<div class="paragraph">
+<p>A cloned table copies the configuration of the source table. However the
+permissions of the source table are not copied to the clone. After a clone is
+created, only the user that created the clone can read and write to it.</p>
+</div>
+<div class="paragraph">
+<p>In the following example we see that data inserted after the clone operation is
+not visible in the clone.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@a14&gt; createtable people
+
+root@a14 people&gt; insert 890435 name last Doe
+root@a14 people&gt; insert 890435 name first John
+
+root@a14 people&gt; clonetable people test
+
+root@a14 people&gt; insert 890436 name first Jane
+root@a14 people&gt; insert 890436 name last Doe
+
+root@a14 people&gt; scan
+890435 name:first []    John
+890435 name:last []    Doe
+890436 name:first []    Jane
+890436 name:last []    Doe
+
+root@a14 people&gt; table test
+
+root@a14 test&gt; scan
+890435 name:first []    John
+890435 name:last []    Doe
+
+root@a14 test&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The du command in the shell shows how much space a table is using in HDFS.
+This command can also show how much overlapping space two cloned tables have in
+HDFS. In the example below du shows table ci is using 428M. Then ci is cloned
+to cic and du shows that both tables share 428M. After three entries are
+inserted into cic and its flushed, du shows the two tables still share 428M but
+cic has 226 bytes to itself. Finally, table cic is compacted and then du shows
+that each table uses 428M.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@a14&gt; du ci
+             428,482,573 [ci]
+
+root@a14&gt; clonetable ci cic
+
+root@a14&gt; du ci cic
+             428,482,573 [ci, cic]
+
+root@a14&gt; table cic
+
+root@a14 cic&gt; insert r1 cf1 cq1 v1
+root@a14 cic&gt; insert r1 cf1 cq2 v2
+root@a14 cic&gt; insert r1 cf1 cq3 v3
+
+root@a14 cic&gt; flush -t cic -w
+27 15:00:13,908 [shell.Shell] INFO : Flush of table cic completed.
+
+root@a14 cic&gt; du ci cic
+             428,482,573 [ci, cic]
+                     226 [cic]
+
+root@a14 cic&gt; compact -t cic -w
+27 15:00:35,871 [shell.Shell] INFO : Compacting table ...
+27 15:03:03,303 [shell.Shell] INFO : Compaction of table cic completed for given range
+
+root@a14 cic&gt; du ci cic
+             428,482,573 [ci]
+             428,482,612 [cic]
+
+root@a14 cic&gt;</pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_exporting_tables">6.11. Exporting Tables</h3>
+<div class="paragraph">
+<p>Accumulo supports exporting tables for the purpose of copying tables to another
+cluster. Exporting and importing tables preserves the tables configuration,
+splits, and logical time. Tables are exported and then copied via the hadoop
+<code>distcp</code> command. To export a table, it must be offline and stay offline while
+<code>distcp</code> runs. Staying offline prevents files from being deleted during the process.
+An easy way to take a table offline without interrupting access is to clone it
+and take the clone offline.</p>
+</div>
+<div class="sect3">
+<h4 id="_table_import_export_example">6.11.1. Table Import/Export Example</h4>
+<div class="paragraph">
+<p>The following example demonstrates Accumulo&#8217;s mechanism for exporting and
+importing tables.</p>
+</div>
+<div class="paragraph">
+<p>The shell session below illustrates creating a table, inserting data, and
+exporting the table.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>    root@test15&gt; createtable table1
+    root@test15 table1&gt; insert a cf1 cq1 v1
+    root@test15 table1&gt; insert h cf1 cq1 v2
+    root@test15 table1&gt; insert z cf1 cq1 v3
+    root@test15 table1&gt; insert z cf1 cq2 v4
+    root@test15 table1&gt; addsplits -t table1 b r
+    root@test15 table1&gt; scan
+    a cf1:cq1 []    v1
+    h cf1:cq1 []    v2
+    z cf1:cq1 []    v3
+    z cf1:cq2 []    v4
+    root@test15&gt; config -t table1 -s table.split.threshold=100M
+    root@test15 table1&gt; clonetable table1 table1_exp
+    root@test15 table1&gt; offline table1_exp
+    root@test15 table1&gt; exporttable -t table1_exp /tmp/table1_export
+    root@test15 table1&gt; quit</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>After executing the export command, a few files are created in the hdfs dir.
+One of the files is a list of files to distcp as shown below.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>    $ hadoop fs -ls /tmp/table1_export
+    Found 2 items
+    -rw-r--r--   3 user supergroup        162 2012-07-25 09:56 /tmp/table1_export/distcp.txt
+    -rw-r--r--   3 user supergroup        821 2012-07-25 09:56 /tmp/table1_export/exportMetadata.zip
+    $ hadoop fs -cat /tmp/table1_export/distcp.txt
+    hdfs://n1.example.com:6093/accumulo/tables/3/default_tablet/F0000000.rf
+    hdfs://n1.example.com:6093/tmp/table1_export/exportMetadata.zip</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Before the table can be imported, it must be copied using <code>distcp</code>. After the
+<code>distcp</code> completea, the cloned table may be deleted.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>    $ hadoop distcp -f /tmp/table1_export/distcp.txt /tmp/table1_export_dest</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The Accumulo shell session below shows importing the table and inspecting it.
+The data, splits, config, and logical time information for the table were
+preserved.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>    root@test15&gt; importtable table1_copy /tmp/table1_export_dest
+    root@test15&gt; table table1_copy
+    root@test15 table1_copy&gt; scan
+    a cf1:cq1 []    v1
+    h cf1:cq1 []    v2
+    z cf1:cq1 []    v3
+    z cf1:cq2 []    v4
+    root@test15 table1_copy&gt; getsplits -t table1_copy
+    b
+    r
+    root@test15&gt; config -t table1_copy -f split
+    ---------+--------------------------+-------------------------------------------
+    SCOPE    | NAME                     | VALUE
+    ---------+--------------------------+-------------------------------------------
+    default  | table.split.threshold .. | 1G
+    table    |    @override ........... | 100M
+    ---------+--------------------------+-------------------------------------------
+    root@test15&gt; tables -l
+    accumulo.metadata    =&gt;        !0
+    accumulo.root        =&gt;        +r
+    table1_copy          =&gt;         5
+    trace                =&gt;         1
+    root@test15 table1_copy&gt; scan -t accumulo.metadata -b 5 -c srv:time
+    5;b srv:time []    M1343224500467
+    5;r srv:time []    M1343224500467
+    5&lt; srv:time []    M1343224500467</pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_iterator_design">7. Iterator Design</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accumulo SortedKeyValueIterators, commonly referred to as Iterators for short, are server-side programming constructs
+that allow users to implement custom retrieval or computational purpose within Accumulo TabletServers.  The name rightly
+brings forward similarities to the Java Iterator interface; however, Accumulo Iterators are more complex than Java
+Iterators. Notably, in addition to the expected methods to retrieve the current element and advance to the next element
+in the iteration, Accumulo Iterators must also support the ability to "move" (<code>seek</code>) to an specified point in the
+iteration (the Accumulo table). Accumulo Iterators are designed to be concatenated together, similar to applying a
+series of transformations to a list of elements. Accumulo Iterators can duplicate their underlying source to create
+multiple "pointers" over the same underlying data (which is extremely powerful since each stream is sorted) or they can
+merge multiple Iterators into a single view. In this sense, a collection of Iterators operating in tandem is close to
+a tree-structure than a list, but there is always a sense of a flow of Key-Value pairs through some Iterators. Iterators
+are not designed to act as triggers nor are they designed to operate outside of the purview of a single table.</p>
+</div>
+<div class="paragraph">
+<p>Understanding how TabletServers invoke the methods on a SortedKeyValueIterator can be obtuse as the actual code is
+buried within the implementation of the TabletServer; however, it is generally unnecessary to have a strong
+understanding of this as the interface provides clear definitions about what each action each method should take. This
+chapter aims to provide a more detailed description of how Iterators are invoked, some best practices and some common
+pitfalls.</p>
+</div>
+<div class="sect2">
+<h3 id="_instantiation">7.1. Instantiation</h3>
+<div class="paragraph">
+<p>To invoke an Accumulo Iterator inside of the TabletServer, the Iterator class must be on the classpath of every
+TabletServer. For production environments, it is common to place a JAR file which contains the Iterator in
+<code>$ACCUMULO_HOME/lib</code>.  In development environments, it is convenient to instead place the JAR file in
+<code>$ACCUMULO_HOME/lib/ext</code> as JAR files in this directory are dynamically reloaded by the TabletServers alleviating the
+need to restart Accumulo while testing an Iterator. Advanced classloader features which enable other types of
+filesystems and per-table classpath configurations (as opposed to process-wide classpaths). These features
+are not covered here, but elsewhere in the user manual.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo references the Iterator class by name and uses Java reflection to instantiate the Iterator. This means that
+Iterators must have a public no-args constructor.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_interface">7.2. Interface</h3>
+<div class="paragraph">
+<p>A normal implementation of the SortedKeyValueIterator defines functionality for the following methods:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">void init(SortedKeyValueIterator&lt;Key,Value&gt; source, Map&lt;String,String&gt; options, IteratorEnvironment env) throws IOException;
+
+boolean hasTop();
+
+void next() throws IOException;
+
+void seek(Range range, Collection&lt;ByteSequence&gt; columnFamilies, boolean inclusive) throws IOException;
+
+Key getTopKey();
+
+Value getTopValue();
+
+SortedKeyValueIterator&lt;Key,Value&gt; deepCopy(IteratorEnvironment env);</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_code_init_code">7.2.1. <code>init</code></h4>
+<div class="paragraph">
+<p>The <code>init</code> method is called by the TabletServer after it constructs an instance of the Iterator.  This method should
+clear/reset any internal state in the Iterator and prepare it to process data.  The first argument, the <code>source</code>, is the
+Iterator "below" this Iterator (where the client is at "top" and the Iterator for files in HDFS are at the "bottom").
+The "source" Iterator provides the Key-Value pairs which this Iterator will operate upon.</p>
+</div>
+<div class="paragraph">
+<p>The second argument, a Map of options, is made up of options provided by the user, options set in the table&#8217;s
+configuration, and/or options set in the containing namespace&#8217;s configuration.
+These options allow for Iterators to dynamically configure themselves on the fly. If no options are used in the current context
+(a Scan or Compaction), the Map will be empty. An example of a configuration item for an Iterator could be a pattern used to filter
+Key-Value pairs in a regular expression Iterator.</p>
+</div>
+<div class="paragraph">
+<p>The third argument, the <code>IteratorEnvironment</code>, is a special object which provides information to this Iterator about the
+context in which it was invoked. Commonly, this information is not necessary to inspect. For example, if an Iterator
+knows that it is running in the context of a full-major compaction (reading all of the data) as opposed to a user scan
+(which may strongly limit the number of columns), the Iterator might make different algorithmic decisions in an attempt to
+optimize itself.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_code_seek_code">7.2.2. <code>seek</code></h4>
+<div class="paragraph">
+<p>The <code>seek</code> method is likely the most confusing method on the Iterator interface. The purpose of this method is to
+advance the stream of Key-Value pairs to a certain point in the iteration (the Accumulo table). It is common that before
+the implementation of this method returns some additional processing is performed which may further advance the current
+position past the <code>startKey</code> of the <code>Range</code>. This, however, is dependent on the functionality the iterator provides. For
+example, a filtering iterator would consume a number Key-Value pairs which do not meets its criteria before <code>seek</code>
+returns. The important condition for <code>seek</code> to meet is that this Iterator should be ready to return the first Key-Value
+pair, or none if no such pair is available, when the method returns. The Key-Value pair would be returned by <code>getTopKey</code>
+and <code>getTopValue</code>, respectively, and <code>hasTop</code> should return a boolean denoting whether or not there is
+a Key-Value pair to return.</p>
+</div>
+<div class="paragraph">
+<p>The arguments passed to seek are as follows:</p>
+</div>
+<div class="paragraph">
+<p>The TabletServer first provides a <code>Range</code>, an object which defines some collection of Accumulo <code>Key`s, which defines the
+Key-Value pairs that this Iterator should return. Each `Range</code> has a <code>startKey</code> and <code>endKey</code> with an inclusive flag for
+both. While this Range is often similar to the Range(s) set by the client on a Scanner or BatchScanner, it is not
+guaranteed to be a Range that the client set. Accumulo will split up larger ranges and group them together based on
+Tablet boundaries per TabletServer. Iterators should not attempt to implement any custom logic based on the Range(s)
+provided to <code>seek</code> and Iterators should not return any Keys that fall outside of the provided Range.</p>
+</div>
+<div class="paragraph">
+<p>The second argument, a <code>Collection&lt;ByteSequence&gt;</code>, is the set of column families which should be retained or
+excluded by this Iterator. The third argument, a boolean, defines whether the collection of column families
+should be treated as an inclusion collection (true) or an exclusion collection (false).</p>
+</div>
+<div class="paragraph">
+<p>It is likely that all implementations of <code>seek</code> will first make a call to the <code>seek</code> method on the
+"source" Iterator that was provided in the <code>init</code> method. The collection of column families and
+the boolean <code>include</code> argument should be passed down as well as the <code>Range</code>. Somewhat commonly, the Iterator will
+also implement some sort of additional logic to find or compute the first Key-Value pair in the provided
+Range. For example, a regular expression Iterator would consume all records which do not match the given
+pattern before returning from <code>seek</code>.</p>
+</div>
+<div class="paragraph">
+<p>It is important to retain the original Range passed to this method to know when this Iterator should stop
+reading more Key-Value pairs. Ignoring this typically does not affect scans from a Scanner, but it
+will result in duplicate keys emitting from a BatchScan if the scanned table has more than one tablet.
+Best practice is to never emit entries outside the seek range.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_code_next_code">7.2.3. <code>next</code></h4>
+<div class="paragraph">
+<p>The <code>next</code> method is analogous to the <code>next</code> method on a Java Iterator: this method should advance
+the Iterator to the next Key-Value pair. For implementations that perform some filtering or complex
+logic, this may result in more than one Key-Value pair being inspected. This method alters
+some internal state that is exposed via the <code>hasTop</code>, <code>getTopKey</code>, and <code>getTopValue</code> methods.</p>
+</div>
+<div class="paragraph">
+<p>The result of this method is commonly caching a Key-Value pair which <code>getTopKey</code> and <code>getTopValue</code>
+can later return. While there is another Key-Value pair to return, <code>hasTop</code> should return true.
+If there are no more Key-Value pairs to return from this Iterator since the last call to
+<code>seek</code>, <code>hasTop</code> should return false.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_code_hastop_code">7.2.4. <code>hasTop</code></h4>
+<div class="paragraph">
+<p>The <code>hasTop</code> method is similar to the <code>hasNext</code> method on a Java Iterator in that it informs
+the caller if there is a Key-Value pair to be returned. If there is no pair to return, this method
+should return false. Like a Java Iterator, multiple calls to <code>hasTop</code> (without calling <code>next</code>) should not
+alter the internal state of the Iterator.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_code_gettopkey_code_and_code_gettopvalue_code">7.2.5. <code>getTopKey</code> and <code>getTopValue</code></h4>
+<div class="paragraph">
+<p>These methods simply return the current Key-Value pair for this iterator. If <code>hasTop</code> returns true,
+both of these methods should return non-null objects. If <code>hasTop</code> returns false, it is undefined
+what these methods should return. Like <code>hasTop</code>, multiple calls to these methods should not alter
+the state of the Iterator.</p>
+</div>
+<div class="paragraph">
+<p>Users should take caution when either</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>caching the Key/Value from <code>getTopKey</code>/<code>getTopValue</code>, for use after calling <code>next</code> on the source iterator.
+In this case, the cached Key/Value object is aliased to the reference returned by the source iterator.
+Iterators may reuse the same Key/Value object in a <code>next</code> call for performance reasons, changing the data
+that the cached Key/Value object references and resulting in a logic bug.</p>
+</li>
+<li>
+<p>modifying the Key/Value from <code>getTopKey</code>/<code>getTopValue</code>. If the source iterator reuses data stored in the Key/Value,
+then the source iterator may use the modified data that the Key/Value references. This may/may not result in a logic bug.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>In both cases, copying the Key/Value&#8217;s data into a new object ensures iterator correctness. If neither case applies,
+it is safe to not copy the Key/Value.  The general guideline is to be aware of who else may use Key/Value objects
+returned from <code>getTopKey</code>/<code>getTopValue</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_code_deepcopy_code">7.2.6. <code>deepCopy</code></h4>
+<div class="paragraph">
+<p>The <code>deepCopy</code> method is similar to the <code>clone</code> method from the Java <code>Cloneable</code> interface.
+Implementations of this method should return a new object of the same type as the Accumulo Iterator
+instance it was called on. Any internal state from the instance <code>deepCopy</code> was called
+on should be carried over to the returned copy. The returned copy should be ready to have
+<code>seek</code> called on it. The SortedKeyValueIterator interface guarantees that <code>init</code> will be called on
+an iterator before <code>deepCopy</code> and that <code>init</code> will not be called on the iterator returned by
+<code>deepCopy</code>.</p>
+</div>
+<div class="paragraph">
+<p>Typically, implementations of <code>deepCopy</code> call a copy-constructor which will initialize
+internal data structures. As with <code>seek</code>, it is common for the <code>IteratorEnvironment</code>
+argument to be ignored as most Iterator implementations can be written without the explicit
+information the environment provides.</p>
+</div>
+<div class="paragraph">
+<p>In the analogy of a series of Iterators representing a tree, <code>deepCopy</code> can be thought of as
+early programming assignments which implement their own tree data structures. <code>deepCopy</code> calls
+copy on its sources (the children), copies itself, attaches the copies of the children, and
+then returns itself.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_tabletserver_invocation_of_iterators">7.3. TabletServer invocation of Iterators</h3>
+<div class="paragraph">
+<p>The following code is a general outline for how TabletServers invoke Iterators.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java"> List&lt;KeyValue&gt; batch;
+ Range range = getRangeFromClient();
+ while(!overSizeLimit(batch)){
+   SortedKeyValueIterator source = getSystemIterator();
+
+   for(String clzName : getUserIterators()){
+    Class&lt;?&gt; clz = Class.forName(clzName);
+    SortedKeyValueIterator iter = (SortedKeyValueIterator) clz.newInstance();
+    iter.init(source, opts, env);
+    source = iter;
+   }
+
+   // read a batch of data to return to client
+   // the last iterator, the "top"
+   SortedKeyValueIterator topIter = source;
+   topIter.seek(getRangeFromUser(), ...)
+
+   while(topIter.hasTop() &amp;&amp; !overSizeLimit(batch)){
+     key = topIter.getTopKey()
+     val = topIter.getTopValue()
+     batch.add(new KeyValue(key, val)
+     if(systemDataSourcesChanged()){
+       // code does not show isolation case, which will
+       // keep using same data sources until a row boundry is hit
+       range = new Range(key, false, range.endKey(), range.endKeyInclusive());
+       break;
+     }
+   }
+ }
+ //return batch of key values to client</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Additionally, the obtuse "re-seek" case can be outlined as the following:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">  // Given the above
+  List&lt;KeyValue&gt; batch = getNextBatch();
+
+  // Store off lastKeyReturned for this client
+  lastKeyReturned = batch.get(batch.size() - 1).getKey();
+
+  // thread goes away (client stops asking for the next batch).
+
+  // Eventually client comes back
+  // Setup as before...
+
+  Range userRange = getRangeFromUser();
+  Range actualRange = new Range(lastKeyReturned, false
+      userRange.getEndKey(), userRange.isEndKeyInclusive());
+
+  // Use the actualRange, not the user provided one
+  topIter.seek(actualRange);</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_isolation">7.4. Isolation</h3>
+<div class="paragraph">
+<p>Accumulo provides a feature which clients can enable to prevent the viewing of partially
+applied mutations within the context of rows. If a client is submitting multiple column
+updates to rows at a time, isolation would ensure that a client would either see all of
+updates made to that row or none of the updates (until they are all applied).</p>
+</div>
+<div class="paragraph">
+<p>When using Isolation, there are additional concerns in iterator design. A scan time iterator in accumulo
+reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a
+key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources
+may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user enables
+Isolation, this will only occur after a new row is returned, in which case it will re-seek to the very beginning of the
+next possible row.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_abstract_iterators">7.5. Abstract Iterators</h3>
+<div class="paragraph">
+<p>A number of Abstract implementations of Iterators are provided to allow for faster creation
+of common patterns. The most commonly used abstract implementations are the <code>Filter</code> and
+<code>Combiner</code> classes. When possible these classes should be used instead as they have been
+thoroughly tested inside Accumulo itself.</p>
+</div>
+<div class="sect3">
+<h4 id="_filter">7.5.1. Filter</h4>
+<div class="paragraph">
+<p>The <code>Filter</code> abstract Iterator provides a very simple implementation which allows implementations
+to define whether or not a Key-Value pair should be returned via an <code>accept(Key, Value)</code> method.</p>
+</div>
+<div class="paragraph">
+<p>Filters are extremely simple to implement; however, when the implementation is filtering a
+large percentage of Key-Value pairs with respect to the total number of pairs examined,
+it can be very inefficient. For example, if a Filter implementation can determine after examining
+part of the row that no other pairs in this row will be accepted, there is no mechanism to
+efficiently skip the remaining Key-Value pairs. Concretely, take a row which is comprised of
+1000 Key-Value pairs. After examining the first 10 Key-Value pairs, it is determined
+that no other Key-Value pairs in this row will be accepted. The Filter must still examine each
+remaining 990 Key-Value pairs in this row. Another way to express this deficiency is that
+Filters have no means to leverage the <code>seek</code> method to efficiently skip large portions
+of Key-Value pairs.</p>
+</div>
+<div class="paragraph">
+<p>As such, the <code>Filter</code> class functions well for filtering small amounts of data, but is
+inefficient for filtering large amounts of data. The decision to use a <code>Filter</code> strongly
+depends on the use case and distribution of data being filtered.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_combiner">7.5.2. Combiner</h4>
+<div class="paragraph">
+<p>The <code>Combiner</code> class is another common abstract Iterator. Similar to the <code>Combiner</code> interface
+define in Hadoop&#8217;s MapReduce framework, implementations of this abstract class reduce
+multiple Values for different versions of a Key (Keys which only differ by timestamps) into one Key-Value pair.
+Combiners provide a simple way to implement common operations like summation and
+aggregation without the need to implement the entire Accumulo Iterator interface.</p>
+</div>
+<div class="paragraph">
+<p>One important consideration when choosing to design a Combiner is that the "reduction" operation
+is often best represented when it is associative and commutative. Operations which do not meet
+these criteria can be implemented; however, the implementation can be difficult.</p>
+</div>
+<div class="paragraph">
+<p>A second consideration is that a Combiner is not guaranteed to see every Key-Value pair
+which differ only by timestamp every time it is invoked. For example, if there are 5 Key-Value
+pairs in a table which only differ by the timestamps 1, 2, 3, 4, and 5, it is not guaranteed that
+every invocation of the Combiner will see 5 timestamps. One invocation might see the Values for
+Keys with timestamp 1 and 4, while another invocation might see the Values for Keys with the
+timestamps 1, 2, 4 and 5.</p>
+</div>
+<div class="paragraph">
+<p>Finally, when configuring an Accumulo table to use a Combiner, be sure to disable the Versioning Iterator or set the
+Combiner at a priority less than the Combiner (the Versioning Iterator is added at a priority of 20 by default). The
+Versioning Iterator will filter out multiple Key-Value pairs that differ only by timestamp and return only the Key-Value
+pair that has the largest timestamp.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_best_practices">7.6. Best practices</h3>
+<div class="paragraph">
+<p>Because of the flexibility that the <code>SortedKeyValueInterface</code> provides, it doesn&#8217;t directly disallow
+many implementations which are poor design decisions. The following are some common recommendations to
+follow and pitfalls to avoid in Iterator implementations.</p>
+</div>
+<div class="sect3">
+<h4 id="_avoid_special_logic_encoded_in_ranges">7.6.1. Avoid special logic encoded in Ranges</h4>
+<div class="paragraph">
+<p>Commonly, granular Ranges that a client passes to an Iterator from a <code>Scanner</code> or <code>BatchScanner</code> are unmodified.
+If a <code>Range</code> falls within the boundaries of a Tablet, an Iterator will often see that same Range in the
+<code>seek</code> method. However, there is no guarantee that the <code>Range</code> will remain unaltered from client to server. As such, Iterators
+should <strong>never</strong> make assumptions about the current state/context based on the <code>Range</code>.</p>
+</div>
+<div class="paragraph">
+<p>The common failure condition is referred to as a "re-seek". In the context of a Scan, TabletServers construct the
+"stack" of Iterators and batch up Key-Value pairs to send back to the client. When a sufficient number of Key-Value
+pairs are collected, it is common for the Iterators to be "torn down" until the client asks for the next batch of
+Key-Value pairs. This is done by the TabletServer to add fairness in ensuring one Scan does not monopolize the available
+resources. When the client asks for the next batch, the implementation modifies the original Range so that servers know
+the point to resume the iteration (to avoid returning duplicate Key-Value pairs). Specifically, the new Range is created
+from the original but is shortened by setting the startKey of the original Range to the Key last returned by the Scan,
+non-inclusive.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_code_seek_code_ing_backwards">7.6.2. <code>seek</code>'ing backwards</h4>
+<div class="paragraph">
+<p>The ability for an Iterator to "skip over" large blocks of Key-Value pairs is a major tenet behind Iterators.
+By <code>seek</code>'ing when it is known that there is a collection of Key-Value pairs which can be ignored can
+greatly increase the speed of a scan as many Key-Value pairs do not have to be deserialized and processed.</p>
+</div>
+<div class="paragraph">
+<p>While the <code>seek</code> method provides the <code>Range</code> that should be used to <code>seek</code> the underlying source Iterator,
+there is no guarantee that the implementing Iterator uses that <code>Range</code> to perform the <code>seek</code> on its
+"source" Iterator. As such, it is possible to seek to any <code>Range</code> and the interface has no assertions
+to prevent this from happening.</p>
+</div>
+<div class="paragraph">
+<p>Since Iterators are allowed to <code>seek</code> to arbitrary Keys, it also allows Iterators to create infinite loops
+inside Scans that will repeatedly read the same data without end. If an arbitrary Range is constructed, it should
+construct a completely new Range as it allows for bugs to be introduced which will break Accumulo.</p>
+</div>
+<div class="paragraph">
+<p>Thus, <code>seek</code>'s should always be thought of as making "forward progress" in the view of the total iteration. The
+<code>startKey</code> of a <code>Range</code> should always be greater than the current Key seen by the Iterator while the <code>endKey</code> of the
+<code>Range</code> should always retain the original <code>endKey</code> (and <code>endKey</code> inclusivity) of the last <code>Range</code> seen by your
+Iterator&#8217;s implementation of seek.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_take_caution_in_constructing_new_data_in_an_iterator">7.6.3. Take caution in constructing new data in an Iterator</h4>
+<div class="paragraph">
+<p>Implementations of Iterator might be tempted to open BatchWriters inside of an Iterator as a means
+to implement triggers for writing additional data outside of their client application. The lifecycle of an Iterator
+is <strong>not</strong> managed in such a way that guarantees that this is safe nor efficient. Specifically, there
+is no way to guarantee that the internal ThreadPool inside of the BatchWriter is closed (and the thread(s)
+are reaped) without calling the close() method. <code>close</code>'ing and recreating a <code>BatchWriter</code> after every
+Key-Value pair is also prohibitively performance limiting to be considered an option.</p>
+</div>
+<div class="paragraph">
+<p>The only safe way to generate additional data in an Iterator is to alter the current Key-Value pair.
+For example, the <code>WholeRowIterator</code> serializes the all of the Key-Values pairs that fall within each
+row. A safe way to generate more data in an Iterator would be to construct an Iterator that is
+"higher" (at a larger priority) than the <code>WholeRowIterator</code>, that is, the Iterator receives the Key-Value pairs which are
+a serialization of many Key-Value pairs. The custom Iterator could deserialize the pairs, compute
+some function, and add a new Key-Value pair to the original collection, re-serializing the collection
+of Key-Value pairs back into a single Key-Value pair.</p>
+</div>
+<div class="paragraph">
+<p>Any other situation is likely not guaranteed to ensure that the caller (a Scan or a Compaction) will
+always see all intended data that is generated.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_final_things_to_remember">7.7. Final things to remember</h3>
+<div class="paragraph">
+<p>Some simple recommendations/points to keep in mind:</p>
+</div>
+<div class="sect3">
+<h4 id="_method_call_order">7.7.1. Method call order</h4>
+<div class="paragraph">
+<p>On an instance of an Iterator: <code>init</code> is always called before <code>seek</code>, <code>seek</code> is always called before <code>hasTop</code>,
+<code>getTopKey</code> and <code>getTopValue</code> will not be called if <code>hasTop</code> returns false.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_teardown">7.7.2. Teardown</h4>
+<div class="paragraph">
+<p>As mentioned, instance of Iterators may be torn down inside of the server transparently. When a complex
+collection of iterators is performing some advanced functionality, they will not be torn down until a Key-Value
+pair is returned out of the "stack" of Iterators (and added into the batch of Key-Values to be returned
+to the caller). Being torn-down is equivalent to a new instance of the Iterator being creating and <code>deepCopy</code>
+being called on the new instance with the old instance provided as the argument to <code>deepCopy</code>. References
+to the old instance are removed and the object is lazily garbage collected by the JVM.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_compaction_time_iterators">7.8. Compaction-time Iterators</h3>
+<div class="paragraph">
+<p>When Iterators are configured to run during compactions, at the <code>minc</code> or <code>majc</code> scope, these Iterators sometimes need
+to make different assertions than those who only operate at scan time. Iterators won&#8217;t see the delete entries; however,
+Iterators will not necessarily see all of the Key-Value pairs in ever invocation. Because compactions often do not rewrite
+all files (only a subset of them), it is possible that the logic take this into consideration.</p>
+</div>
+<div class="paragraph">
+<p>For example, a Combiner that runs over data at during compactions, might not see all of the values for a given Key. The
+Combiner must recognize this and not perform any function that would be incorrect due
+to the missing values.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_iterator_testing">8. Iterator Testing</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Iterators, while extremely powerful, are notoriously difficult to test. While the API defines
+the methods an Iterator must implement and each method&#8217;s functionality, the actual invocation
+of these methods by Accumulo TabletServers can be surprisingly difficult to mimic in unit tests.</p>
+</div>
+<div class="paragraph">
+<p>The Apache Accumulo "Iterator Test Harness" is designed to provide a generalized testing framework
+for all Accumulo Iterators to leverage to identify common pitfalls in user-created Iterators.</p>
+</div>
+<div class="sect2">
+<h3 id="_framework_use">8.1. Framework Use</h3>
+<div class="paragraph">
+<p>The harness provides an abstract class for use with JUnit4. Users must define the following for this
+abstract class:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>A <code>SortedMap</code> of input data (<code>Key</code>-<code>Value</code> pairs)</p>
+</li>
+<li>
+<p>A <code>Range</code> to use in tests</p>
+</li>
+<li>
+<p>A <code>Map</code> of options (<code>String</code> to <code>String</code> pairs)</p>
+</li>
+<li>
+<p>A <code>SortedMap</code> of output data (<code>Key</code>-<code>Value</code> pairs)</p>
+</li>
+<li>
+<p>A list of `IteratorTestCase`s (these can be automatically discovered)</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The majority of effort a user must make is in creating the input dataset and the expected
+output dataset for the iterator being tested.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_normal_test_outline">8.2. Normal Test Outline</h3>
+<div class="paragraph">
+<p>Most iterator tests will follow the given outline:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">import java.util.List;
+import java.util.SortedMap;
+
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+import org.apache.accumulo.iteratortest.IteratorTestCaseFinder;
+import org.apache.accumulo.iteratortest.IteratorTestInput;
+import org.apache.accumulo.iteratortest.IteratorTestOutput;
+import org.apache.accumulo.iteratortest.junit4.BaseJUnit4IteratorTest;
+import org.apache.accumulo.iteratortest.testcases.IteratorTestCase;
+import org.junit.runners.Parameterized.Parameters;
+
+public class MyIteratorTest extends BaseJUnit4IteratorTest {
+
+  @Parameters
+  public static Object[][] parameters() {
+    final IteratorTestInput input = createIteratorInput();
+    final IteratorTestOutput output = createIteratorOutput();
+    final List&lt;IteratorTestCase&gt; testCases = IteratorTestCaseFinder.findAllTestCases();
+    return BaseJUnit4IteratorTest.createParameters(input, output, tests);
+  }
+
+  private static SortedMap&lt;Key,Value&gt; INPUT_DATA = createInputData();
+  private static SortedMap&lt;Key,Value&gt; OUTPUT_DATA = createOutputData();
+
+  private static SortedMap&lt;Key,Value&gt; createInputData() {
+    // TODO -- implement this method
+  }
+
+  private static SortedMap&lt;Key,Value&gt; createOutputData() {
+    // TODO -- implement this method
+  }
+
+  private static IteratorTestInput createIteratorInput() {
+    final Map&lt;String,String&gt; options = createIteratorOptions();
+    final Range range = createRange();
+    return new IteratorTestInput(MyIterator.class, options, range, INPUT_DATA);
+  }
+
+  private static Map&lt;String,String&gt; createIteratorOptions() {
+    // TODO -- implement this method
+    // Tip: Use INPUT_DATA if helpful in generating output
+  }
+
+  private static Range createRange() {
+    // TODO -- implement this method
+  }
+
+  private static IteratorTestOutput createIteratorOutput() {
+    return new IteratorTestOutput(OUTPUT_DATA);
+  }
+
+}</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_limitations">8.3. Limitations</h3>
+<div class="paragraph">
+<p>While the provided `IteratorTestCase`s should exercise common edge-cases in user iterators,
+there are still many limitations to the existing test harness. Some of them are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Can only specify a single iterator, not many (a "stack")</p>
+</li>
+<li>
+<p>No control over provided IteratorEnvironment for tests</p>
+</li>
+<li>
+<p>Exercising delete keys (especially with major compactions that do not include all files)</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>These are left as future improvements to the harness.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_table_design">9. Table Design</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_basic_table">9.1. Basic Table</h3>
+<div class="paragraph">
+<p>Since Accumulo tables are sorted by row ID, each table can be thought of as being
+indexed by the row ID. Lookups performed by row ID can be executed quickly, by doing
+a binary search, first across the tablets, and then within a tablet. Clients should
+choose a row ID carefully in order to support their desired application. A simple rule
+is to select a unique identifier as the row ID for each entity to be stored and assign
+all the other attributes to be tracked to be columns under this row ID. For example,
+if we have the following data in a comma-separated file:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>userid,age,address,account-balance</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We might choose to store this data using the userid as the rowID, the column
+name in the column family, and a blank column qualifier:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Mutation m = new Mutation(userid);
+final String column_qualifier = "";
+m.put("age", column_qualifier, age);
+m.put("address", column_qualifier, address);
+m.put("balance", column_qualifier, account_balance);
+
+writer.add(m);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We could then retrieve any of the columns for a specific userid by specifying the
+userid as the range of a scanner and fetching specific columns:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Range r = new Range(userid, userid); // single row
+Scanner s = conn.createScanner("userdata", auths);
+s.setRange(r);
+s.fetchColumnFamily(new Text("age"));
+
+for(Entry&lt;Key,Value&gt; entry : s) {
+  System.out.println(entry.getValue().toString());
+}</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_rowid_design">9.2. RowID Design</h3>
+<div class="paragraph">
+<p>Often it is necessary to transform the rowID in order to have rows ordered in a way
+that is optimal for anticipated access patterns. A good example of this is reversing
+the order of components of internet domain names in order to group rows of the
+same parent domain together:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>com.google.code
+com.google.labs
+com.google.mail
+com.yahoo.mail
+com.yahoo.research</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Some data may result in the creation of very large rows - rows with many columns.
+In this case the table designer may wish to split up these rows for better load
+balancing while keeping them sorted together for scanning purposes. This can be
+done by appending a random substring at the end of the row:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>com.google.code_00
+com.google.code_01
+com.google.code_02
+com.google.labs_00
+com.google.mail_00
+com.google.mail_01</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>It could also be done by adding a string representation of some period of time such as date to the week
+or month:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>com.google.code_201003
+com.google.code_201004
+com.google.code_201005
+com.google.labs_201003
+com.google.mail_201003
+com.google.mail_201004</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Appending dates provides the additional capability of restricting a scan to a given
+date range.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_lexicoders">9.3. Lexicoders</h3>
+<div class="paragraph">
+<p>Since Keys in Accumulo are sorted lexicographically by default, it&#8217;s often useful to encode
+common data types into a byte format in which their sort order corresponds to the sort order
+in their native form. An example of this is encoding dates and numerical data so that they can
+be better seeked or searched in ranges.</p>
+</div>
+<div class="paragraph">
+<p>The lexicoders are a standard and extensible way of encoding Java types. Here&#8217;s an example
+of a lexicoder that encodes a java Date object so that it sorts lexicographically:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">// create new date lexicoder
+DateLexicoder dateEncoder = new DateLexicoder();
+
+// truncate time to hours
+long epoch = System.currentTimeMillis();
+Date hour = new Date(epoch - (epoch % 3600000));
+
+// encode the rowId so that it is sorted lexicographically
+Mutation mutation = new Mutation(dateEncoder.encode(hour));
+mutation.put(new Text("colf"), new Text("colq"), new Value(new byte[]{}));</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If we want to return the most recent date first, we can reverse the sort order
+with the reverse lexicoder:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">// create new date lexicoder and reverse lexicoder
+DateLexicoder dateEncoder = new DateLexicoder();
+ReverseLexicoder reverseEncoder = new ReverseLexicoder(dateEncoder);
+
+// truncate date to hours
+long epoch = System.currentTimeMillis();
+Date hour = new Date(epoch - (epoch % 3600000));
+
+// encode the rowId so that it sorts in reverse lexicographic order
+Mutation mutation = new Mutation(reverseEncoder.encode(hour));
+mutation.put(new Text("colf"), new Text("colq"), new Value(new byte[]{}));</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_indexing">9.4. Indexing</h3>
+<div class="paragraph">
+<p>In order to support lookups via more than one attribute of an entity, additional
+indexes can be built. However, because Accumulo tables can support any number of
+columns without specifying them beforehand, a single additional index will often
+suffice for supporting lookups of records in the main table. Here, the index has, as
+the rowID, the Value or Term from the main table, the column families are the same,
+and the column qualifier of the index table contains the rowID from the main table.</p>
+</div>
+<table class="tableblock frame-all grid-rows" style="width: 75%;">
+<colgroup>
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-center valign-top">RowID</th>
+<th class="tableblock halign-center valign-top">Column Family</th>
+<th class="tableblock halign-center valign-top">Column Qualifier</th>
+<th class="tableblock halign-center valign-top">Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Term</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Field Name</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">MainRowID</p></td>
+<td class="tableblock halign-center valign-top"></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Note: We store rowIDs in the column qualifier rather than the Value so that we can
+have more than one rowID associated with a particular term within the index. If we
+stored this in the Value we would only see one of the rows in which the value
+appears since Accumulo is configured by default to return the one most recent
+value associated with a key.</p>
+</div>
+<div class="paragraph">
+<p>Lookups can then be done by scanning the Index Table first for occurrences of the
+desired values in the columns specified, which returns a list of row ID from the main
+table. These can then be used to retrieve each matching record, in their entirety, or a
+subset of their columns, from the Main Table.</p>
+</div>
+<div class="paragraph">
+<p>To support efficient lookups of multiple rowIDs from the same table, the Accumulo
+client library provides a BatchScanner. Users specify a set of Ranges to the
+BatchScanner, which performs the lookups in multiple threads to multiple servers
+and returns an Iterator over all the rows retrieved. The rows returned are NOT in
+sorted order, as is the case with the basic Scanner interface.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">// first we scan the index for IDs of rows matching our query
+Text term = new Text("mySearchTerm");
+
+HashSet&lt;Range&gt; matchingRows = new HashSet&lt;Range&gt;();
+
+Scanner indexScanner = createScanner("index", auths);
+indexScanner.setRange(new Range(term, term));
+
+// we retrieve the matching rowIDs and create a set of ranges
+for(Entry&lt;Key,Value&gt; entry : indexScanner) {
+    matchingRows.add(new Range(entry.getKey().getColumnQualifier()));
+}
+
+// now we pass the set of rowIDs to the batch scanner to retrieve them
+BatchScanner bscan = conn.createBatchScanner("table", auths, 10);
+bscan.setRanges(matchingRows);
+bscan.fetchColumnFamily(new Text("attributes"));
+
+for(Entry&lt;Key,Value&gt; entry : bscan) {
+    System.out.println(entry.getValue());
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>One advantage of the dynamic schema capabilities of Accumulo is that different
+fields may be indexed into the same physical table. However, it may be necessary to
+create different index tables if the terms must be formatted differently in order to
+maintain proper sort order. For example, real numbers must be formatted
+differently than their usual notation in order to be sorted correctly. In these cases,
+usually one index per unique data type will suffice.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_entity_attribute_and_graph_tables">9.5. Entity-Attribute and Graph Tables</h3>
+<div class="paragraph">
+<p>Accumulo is ideal for storing entities and their attributes, especially of the
+attributes are sparse. It is often useful to join several datasets together on common
+entities within the same table. This can allow for the representation of graphs,
+including nodes, their attributes, and connections to other nodes.</p>
+</div>
+<div class="paragraph">
+<p>Rather than storing individual events, Entity-Attribute or Graph tables store
+aggregate information about the entities involved in the events and the
+relationships between entities. This is often preferrable when single events aren&#8217;t
+very useful and when a continuously updated summarization is desired.</p>
+</div>
+<div class="paragraph">
+<p>The physical schema for an entity-attribute or graph table is as follows:</p>
+</div>
+<table class="tableblock frame-all grid-rows" style="width: 75%;">
+<colgroup>
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-center valign-top">RowID</th>
+<th class="tableblock halign-center valign-top">Column Family</th>
+<th class="tableblock halign-center valign-top">Column Qualifier</th>
+<th class="tableblock halign-center valign-top">Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">EntityID</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Attribute Name</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Attribute Value</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Weight</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">EntityID</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Edge Type</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Related EntityID</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Weight</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>For example, to keep track of employees, managers and products the following
+entity-attribute table could be used. Note that the weights are not always necessary
+and are set to 0 when not used.</p>
+</div>
+<table class="tableblock frame-all grid-rows" style="width: 75%;">
+<colgroup>
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-center valign-top">RowID</th>
+<th class="tableblock halign-center valign-top">Column Family</th>
+<th class="tableblock halign-center valign-top">Column Qualifier</th>
+<th class="tableblock halign-center valign-top">Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">name</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">bob</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">department</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">sales</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">hire_date</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">20030102</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">units_sold</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">780</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">name</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">george</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">department</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">sales</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">manager_of</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">manager_of</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">name</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">harry</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">department</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">accounts_recv</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">hire_date</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">20000405</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">units_sold</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">566</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">units_sold</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">232</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">product_name</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">nike_airs</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">product_type</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">shoe</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">germany</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">900</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">brazil</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">200</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">product_name</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">basic_jacket</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">product_type</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">clothing</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">usa</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">3454</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">germany</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">700</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>To allow efficient updating of edge weights, an aggregating iterator can be
+configured to add the value of all mutations applied with the same key. These types
+of tables can easily be created from raw events by simply extracting the entities,
+attributes, and relationships from individual events and inserting the keys into
+Accumulo each with a count of 1. The aggregating iterator will take care of
+maintaining the edge weights.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_document_partitioned_indexing">9.6. Document-Partitioned Indexing</h3>
+<div class="paragraph">
+<p>Using a simple index as described above works well when looking for records that
+match one of a set of given criteria. When looking for records that match more than
+one criterion simultaneously, such as when looking for documents that contain all of
+the words &#8216;the&#8217; and &#8216;white&#8217; and &#8216;house&#8217;, there are several issues.</p>
+</div>
+<div class="paragraph">
+<p>First is that the set of all records matching any one of the search terms must be sent
+to the client, which incurs a lot of network traffic. The second problem is that the
+client is responsible for performing set intersection on the sets of records returned
+to eliminate all but the records matching all search terms. The memory of the client
+may easily be overwhelmed during this operation.</p>
+</div>
+<div class="paragraph">
+<p>For these reasons Accumulo includes support for a scheme known as sharded
+indexing, in which these set operations can be performed at the TabletServers and
+decisions about which records to include in the result set can be made without
+incurring network traffic.</p>
+</div>
+<div class="paragraph">
+<p>This is accomplished via partitioning records into bins that each reside on at most
+one TabletServer, and then creating an index of terms per record within each bin as
+follows:</p>
+</div>
+<table class="tableblock frame-all grid-rows" style="width: 75%;">
+<colgroup>
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-center valign-top">RowID</th>
+<th class="tableblock halign-center valign-top">Column Family</th>
+<th class="tableblock halign-center valign-top">Column Qualifier</th>
+<th class="tableblock halign-center valign-top">Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">BinID</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Term</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">DocID</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Weight</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Documents or records are mapped into bins by a user-defined ingest application. By
+storing the BinID as the RowID we ensure that all the information for a particular
+bin is contained in a single tablet and hosted on a single TabletServer since
+Accumulo never splits rows across tablets. Storing the Terms as column families
+serves to enable fast lookups of all the documents within this bin that contain the
+given term.</p>
+</div>
+<div class="paragraph">
+<p>Finally, we perform set intersection operations on the TabletServer via a special
+iterator called the Intersecting Iterator. Since documents are partitioned into many
+bins, a search of all documents must search every bin. We can use the BatchScanner
+to scan all bins in parallel. The Intersecting Iterator should be enabled on a
+BatchScanner within user query code as follows:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Text[] terms = {new Text("the"), new Text("white"), new Text("house")};
+
+BatchScanner bscan = conn.createBatchScanner(table, auths, 20);
+
+IteratorSetting iter = new IteratorSetting(20, "ii", IntersectingIterator.class);
+IntersectingIterator.setColumnFamilies(iter, terms);
+
+bscan.addScanIterator(iter);
+bscan.setRanges(Collections.singleton(new Range()));
+
+for(Entry&lt;Key,Value&gt; entry : bscan) {
+    System.out.println(" " + entry.getKey().getColumnQualifier());
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This code effectively has the BatchScanner scan all tablets of a table, looking for
+documents that match all the given terms. Because all tablets are being scanned for
+every query, each query is more expensive than other Accumulo scans, which
+typically involve a small number of TabletServers. This reduces the number of
+concurrent queries supported and is subject to what is known as the &#8216;straggler&#8217;
+problem in which every query runs as slow as the slowest server participating.</p>
+</div>
+<div class="paragraph">
+<p>Of course, fast servers will return their results to the client which can display them
+to the user immediately while they wait for the rest of the results to arrive. If the
+results are unordered this is quite effective as the first results to arrive are as good
+as any others to the user.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_high_speed_ingest">10. High-Speed Ingest</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accumulo is often used as part of a larger data processing and storage system. To
+maximize the performance of a parallel system involving Accumulo, the ingestion
+and query components should be designed to provide enough parallelism and
+concurrency to avoid creating bottlenecks for users and other systems writing to
+and reading from Accumulo. There are several ways to achieve high ingest
+performance.</p>
+</div>
+<div class="sect2">
+<h3 id="_pre_splitting_new_tables">10.1. Pre-Splitting New Tables</h3>
+<div class="paragraph">
+<p>New tables consist of a single tablet by default. As mutations are applied, the table
+grows and splits into multiple tablets which are balanced by the Master across
+TabletServers. This implies that the aggregate ingest rate will be limited to fewer
+servers than are available within the cluster until the table has reached the point
+where there are tablets on every TabletServer.</p>
+</div>
+<div class="paragraph">
+<p>Pre-splitting a table ensures that there are as many tablets as desired available
+before ingest begins to take advantage of all the parallelism possible with the cluster
+hardware. Tables can be split at any time by using the shell:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; addsplits -sf /local_splitfile -t mytable</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>For the purposes of providing parallelism to ingest it is not necessary to create more
+tablets than there are physical machines within the cluster as the aggregate ingest
+rate is a function of the number of physical machines. Note that the aggregate ingest
+rate is still subject to the number of machines running ingest clients, and the
+distribution of rowIDs across the table. The aggregation ingest rate will be
+suboptimal if there are many inserts into a small number of rowIDs.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_multiple_ingester_clients">10.2. Multiple Ingester Clients</h3>
+<div class="paragraph">
+<p>Accumulo is capable of scaling to very high rates of ingest, which is dependent upon
+not just the number of TabletServers in operation but also the number of ingest
+clients. This is because a single client, while capable of batching mutations and
+sending them to all TabletServers, is ultimately limited by the amount of data that
+can be processed on a single machine. The aggregate ingest rate will scale linearly
+with the number of clients up to the point at which either the aggregate I/O of
+TabletServers or total network bandwidth capacity is reached.</p>
+</div>
+<div class="paragraph">
+<p>In operational settings where high rates of ingest are paramount, clusters are often
+configured to dedicate some number of machines solely to running Ingester Clients.
+The exact ratio of clients to TabletServers necessary for optimum ingestion rates
+will vary according to the distribution of resources per machine and by data type.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_bulk_ingest">10.3. Bulk Ingest</h3>
+<div class="paragraph">
+<p>Accumulo supports the ability to import files produced by an external process such
+as MapReduce into an existing table. In some cases it may be faster to load data this
+way rather than via ingesting through clients using BatchWriters. This allows a large
+number of machines to format data the way Accumulo expects. The new files can
+then simply be introduced to Accumulo via a shell command.</p>
+</div>
+<div class="paragraph">
+<p>To configure MapReduce to format data in preparation for bulk loading, the job
+should be set to use a range partitioner instead of the default hash partitioner. The
+range partitioner uses the split points of the Accumulo table that will receive the
+data. The split points can be obtained from the shell and used by the MapReduce
+RangePartitioner. Note that this is only useful if the existing table is already split
+into multiple tablets.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; getsplits
+aa
+ab
+ac
+...
+zx
+zy
+zz</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Run the MapReduce job, using the AccumuloFileOutputFormat to create the files to
+be introduced to Accumulo. Once this is complete, the files can be added to
+Accumulo via the shell:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@myinstance mytable&gt; importdirectory /files_dir /failures</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Note that the paths referenced are directories within the same HDFS instance over
+which Accumulo is running. Accumulo places any files that failed to be added to the
+second directory specified.</p>
+</div>
+<div class="paragraph">
+<p>A complete example of using Bulk Ingest can be found at
+<code>accumulo/docs/examples/README.bulkIngest</code>.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_logical_time_for_bulk_ingest">10.4. Logical Time for Bulk Ingest</h3>
+<div class="paragraph">
+<p>Logical time is important for bulk imported data, for which the client code may
+be choosing a timestamp. At bulk import time, the user can choose to enable
+logical time for the set of files being imported. When its enabled, Accumulo
+uses a specialized system iterator to lazily set times in a bulk imported file.
+This mechanism guarantees that times set by unsynchronized multi-node
+applications (such as those running on MapReduce) will maintain some semblance
+of causal ordering. This mitigates the problem of the time being wrong on the
+system that created the file for bulk import. These times are not set when the
+file is imported, but whenever it is read by scans or compactions. At import, a
+time is obtained and always used by the specialized system iterator to set that
+time.</p>
+</div>
+<div class="paragraph">
+<p>The timestamp assigned by Accumulo will be the same for every key in the file.
+This could cause problems if the file contains multiple keys that are identical
+except for the timestamp. In this case, the sort order of the keys will be
+undefined. This could occur if an insert and an update were in the same bulk
+import file.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_mapreduce_ingest">10.5. MapReduce Ingest</h3>
+<div class="paragraph">
+<p>It is possible to efficiently write many mutations to Accumulo in parallel via a
+MapReduce job. In this scenario the MapReduce is written to process data that lives
+in HDFS and write mutations to Accumulo using the AccumuloOutputFormat. See
+the MapReduce section under Analytics for details.</p>
+</div>
+<div class="paragraph">
+<p>An example of using MapReduce can be found under
+<code>accumulo/docs/examples/README.mapred</code>.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_analytics">11. Analytics</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accumulo supports more advanced data processing than simply keeping keys
+sorted and performing efficient lookups. Analytics can be developed by using
+MapReduce and Iterators in conjunction with Accumulo tables.</p>
+</div>
+<div class="sect2">
+<h3 id="_mapreduce">11.1. MapReduce</h3>
+<div class="paragraph">
+<p>Accumulo tables can be used as the source and destination of MapReduce jobs. To
+use an Accumulo table with a MapReduce job (specifically with the new Hadoop API
+as of version 0.20), configure the job parameters to use the AccumuloInputFormat
+and AccumuloOutputFormat. Accumulo specific parameters can be set via these
+two format classes to do the following:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Authenticate and provide user credentials for the input</p>
+</li>
+<li>
+<p>Restrict the scan to a range of rows</p>
+</li>
+<li>
+<p>Restrict the input to a subset of available columns</p>
+</li>
+</ul>
+</div>
+<div class="sect3">
+<h4 id="_mapper_and_reducer_classes">11.1.1. Mapper and Reducer classes</h4>
+<div class="paragraph">
+<p>To read from an Accumulo table create a Mapper with the following class
+parameterization and be sure to configure the AccumuloInputFormat.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">class MyMapper extends Mapper&lt;Key,Value,WritableComparable,Writable&gt; {
+    public void map(Key k, Value v, Context c) {
+        // transform key and value data here
+    }
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To write to an Accumulo table, create a Reducer with the following class
+parameterization and be sure to configure the AccumuloOutputFormat. The key
+emitted from the Reducer identifies the table to which the mutation is sent. This
+allows a single Reducer to write to more than one table if desired. A default table
+can be configured using the AccumuloOutputFormat, in which case the output table
+name does not have to be passed to the Context object within the Reducer.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">class MyReducer extends Reducer&lt;WritableComparable, Writable, Text, Mutation&gt; {
+    public void reduce(WritableComparable key, Iterable&lt;Text&gt; values, Context c) {
+        Mutation m;
+        // create the mutation based on input key and value
+        c.write(new Text("output-table"), m);
+    }
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The Text object passed as the output should contain the name of the table to which
+this mutation should be applied. The Text can be null in which case the mutation
+will be applied to the default table name specified in the AccumuloOutputFormat
+options.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_accumuloinputformat_options">11.1.2. AccumuloInputFormat options</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Job job = new Job(getConf());
+AccumuloInputFormat.setInputInfo(job,
+        "user",
+        "passwd".getBytes(),
+        "table",
+        new Authorizations());
+
+AccumuloInputFormat.setZooKeeperInstance(job, "myinstance",
+        "zooserver-one,zooserver-two");</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Optional Settings:</strong></p>
+</div>
+<div class="paragraph">
+<p>To restrict Accumulo to a set of row ranges:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">ArrayList&lt;Range&gt; ranges = new ArrayList&lt;Range&gt;();
+// populate array list of row ranges ...
+AccumuloInputFormat.setRanges(job, ranges);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To restrict Accumulo to a list of columns:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">ArrayList&lt;Pair&lt;Text,Text&gt;&gt; columns = new ArrayList&lt;Pair&lt;Text,Text&gt;&gt;();
+// populate list of columns
+AccumuloInputFormat.fetchColumns(job, columns);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To use a regular expression to match row IDs:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">IteratorSetting is = new IteratorSetting(30, RexExFilter.class);
+RegExFilter.setRegexs(is, ".*suffix", null, null, null, true);
+AccumuloInputFormat.addIterator(job, is);</code></pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_accumulomultitableinputformat_options">11.1.3. AccumuloMultiTableInputFormat options</h4>
+<div class="paragraph">
+<p>The AccumuloMultiTableInputFormat allows the scanning over multiple tables
+in a single MapReduce job. Separate ranges, columns, and iterators can be
+used for each table.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">InputTableConfig tableOneConfig = new InputTableConfig();
+InputTableConfig tableTwoConfig = new InputTableConfig();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To set the configuration objects on the job:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Map&lt;String, InputTableConfig&gt; configs = new HashMap&lt;String,InputTableConfig&gt;();
+configs.put("table1", tableOneConfig);
+configs.put("table2", tableTwoConfig);
+AccumuloMultiTableInputFormat.setInputTableConfigs(job, configs);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Optional settings:</strong></p>
+</div>
+<div class="paragraph">
+<p>To restrict to a set of ranges:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">ArrayList&lt;Range&gt; tableOneRanges = new ArrayList&lt;Range&gt;();
+ArrayList&lt;Range&gt; tableTwoRanges = new ArrayList&lt;Range&gt;();
+// populate array lists of row ranges for tables...
+tableOneConfig.setRanges(tableOneRanges);
+tableTwoConfig.setRanges(tableTwoRanges);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To restrict Accumulo to a list of columns:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">ArrayList&lt;Pair&lt;Text,Text&gt;&gt; tableOneColumns = new ArrayList&lt;Pair&lt;Text,Text&gt;&gt;();
+ArrayList&lt;Pair&lt;Text,Text&gt;&gt; tableTwoColumns = new ArrayList&lt;Pair&lt;Text,Text&gt;&gt;();
+// populate lists of columns for each of the tables ...
+tableOneConfig.fetchColumns(tableOneColumns);
+tableTwoConfig.fetchColumns(tableTwoColumns);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To set scan iterators:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">List&lt;IteratorSetting&gt; tableOneIterators = new ArrayList&lt;IteratorSetting&gt;();
+List&lt;IteratorSetting&gt; tableTwoIterators = new ArrayList&lt;IteratorSetting&gt;();
+// populate the lists of iterator settings for each of the tables ...
+tableOneConfig.setIterators(tableOneIterators);
+tableTwoConfig.setIterators(tableTwoIterators);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The name of the table can be retrieved from the input split:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">class MyMapper extends Mapper&lt;Key,Value,WritableComparable,Writable&gt; {
+    public void map(Key k, Value v, Context c) {
+        RangeInputSplit split = (RangeInputSplit)c.getInputSplit();
+        String tableName = split.getTableName();
+        // do something with table name
+    }
+}</code></pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_accumulooutputformat_options">11.1.4. AccumuloOutputFormat options</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">boolean createTables = true;
+String defaultTable = "mytable";
+
+AccumuloOutputFormat.setOutputInfo(job,
+        "user",
+        "passwd".getBytes(),
+        createTables,
+        defaultTable);
+
+AccumuloOutputFormat.setZooKeeperInstance(job, "myinstance",
+        "zooserver-one,zooserver-two");</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Optional Settings:</strong></p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">AccumuloOutputFormat.setMaxLatency(job, 300000); // milliseconds
+AccumuloOutputFormat.setMaxMutationBufferSize(job, 50000000); // bytes</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An example of using MapReduce with Accumulo can be found at
+<code>accumulo/docs/examples/README.mapred</code>.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_combiners_2">11.2. Combiners</h3>
+<div class="paragraph">
+<p>Many applications can benefit from the ability to aggregate values across common
+keys. This can be done via Combiner iterators and is similar to the Reduce step in
+MapReduce. This provides the ability to define online, incrementally updated
+analytics without the overhead or latency associated with batch-oriented
+MapReduce jobs.</p>
+</div>
+<div class="paragraph">
+<p>All that is needed to aggregate values of a table is to identify the fields over which
+values will be grouped, insert mutations with those fields as the key, and configure
+the table with a combining iterator that supports the summarizing operation
+desired.</p>
+</div>
+<div class="paragraph">
+<p>The only restriction on an combining iterator is that the combiner developer
+should not assume that all values for a given key have been seen, since new
+mutations can be inserted at anytime. This precludes using the total number of
+values in the aggregation such as when calculating an average, for example.</p>
+</div>
+<div class="sect3">
+<h4 id="_feature_vectors">11.2.1. Feature Vectors</h4>
+<div class="paragraph">
+<p>An interesting use of combining iterators within an Accumulo table is to store
+feature vectors for use in machine learning algorithms. For example, many
+algorithms such as k-means clustering, support vector machines, anomaly detection,
+etc. use the concept of a feature vector and the calculation of distance metrics to
+learn a particular model. The columns in an Accumulo table can be used to efficiently
+store sparse features and their weights to be incrementally updated via the use of an
+combining iterator.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_statistical_modeling">11.3. Statistical Modeling</h3>
+<div class="paragraph">
+<p>Statistical models that need to be updated by many machines in parallel could be
+similarly stored within an Accumulo table. For example, a MapReduce job that is
+iteratively updating a global statistical model could have each map or reduce worker
+reference the parts of the model to be read and updated through an embedded
+Accumulo client.</p>
+</div>
+<div class="paragraph">
+<p>Using Accumulo this way enables efficient and fast lookups and updates of small
+pieces of information in a random access pattern, which is complementary to
+MapReduce&#8217;s sequential access model.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_security">12. Security</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accumulo extends the BigTable data model to implement a security mechanism
+known as cell-level security. Every key-value pair has its own security label, stored
+under the column visibility element of the key, which is used to determine whether
+a given user meets the security requirements to read the value. This enables data of
+various security levels to be stored within the same row, and users of varying
+degrees of access to query the same table, while preserving data confidentiality.</p>
+</div>
+<div class="sect2">
+<h3 id="_security_label_expressions">12.1. Security Label Expressions</h3>
+<div class="paragraph">
+<p>When mutations are applied, users can specify a security label for each value. This is
+done as the Mutation is created by passing a ColumnVisibility object to the put()
+method:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Text rowID = new Text("row1");
+Text colFam = new Text("myColFam");
+Text colQual = new Text("myColQual");
+ColumnVisibility colVis = new ColumnVisibility("public");
+long timestamp = System.currentTimeMillis();
+
+Value value = new Value("myValue");
+
+Mutation mutation = new Mutation(rowID);
+mutation.put(colFam, colQual, colVis, timestamp, value);</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_security_label_expression_syntax">12.2. Security Label Expression Syntax</h3>
+<div class="paragraph">
+<p>Security labels consist of a set of user-defined tokens that are required to read the
+value the label is associated with. The set of tokens required can be specified using
+syntax that supports logical AND <code>&amp;</code> and OR <code>|</code> combinations of terms, as
+well as nesting groups <code>()</code> of terms together.</p>
+</div>
+<div class="paragraph">
+<p>Each term is comprised of one to many alpha-numeric characters, hyphens, underscores or
+periods. Optionally, each term may be wrapped in quotation marks
+which removes the restriction on valid characters. In quoted terms, quotation marks
+and backslash characters can be used as characters in the term by escaping them
+with a backslash.</p>
+</div>
+<div class="paragraph">
+<p>For example, suppose within our organization we want to label our data values with
+security labels defined in terms of user roles. We might have tokens such as:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>admin
+audit
+system</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>These can be specified alone or combined using logical operators:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>// Users must have admin privileges
+admin
+
+// Users must have admin and audit privileges
+admin&amp;audit
+
+// Users with either admin or audit privileges
+admin|audit
+
+// Users must have audit and one or both of admin or system
+(admin|system)&amp;audit</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When both <code>|</code> and <code>&amp;</code> operators are used, parentheses must be used to specify
+precedence of the operators.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_authorization">12.3. Authorization</h3>
+<div class="paragraph">
+<p>When clients attempt to read data from Accumulo, any security labels present are
+examined against the set of authorizations passed by the client code when the
+Scanner or BatchScanner are created. If the authorizations are determined to be
+insufficient to satisfy the security label, the value is suppressed from the set of
+results sent back to the client.</p>
+</div>
+<div class="paragraph">
+<p>Authorizations are specified as a comma-separated list of tokens the user possesses:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">// user possesses both admin and system level access
+Authorization auths = new Authorization("admin","system");
+
+Scanner s = connector.createScanner("table", auths);</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_user_authorizations">12.4. User Authorizations</h3>
+<div class="paragraph">
+<p>Each Accumulo user has a set of associated security labels. To manipulate
+these in the shell while using the default authorizor, use the setuaths and getauths commands.
+These may also be modified for the default authorizor using the java security operations API.</p>
+</div>
+<div class="paragraph">
+<p>When a user creates a scanner a set of Authorizations is passed. If the
+authorizations passed to the scanner are not a subset of the users
+authorizations, then an exception will be thrown.</p>
+</div>
+<div class="paragraph">
+<p>To prevent users from writing data they can not read, add the visibility
+constraint to a table. Use the -evc option in the createtable shell command to
+enable this constraint. For existing tables use the following shell command to
+enable the visibility constraint. Ensure the constraint number does not
+conflict with any existing constraints.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>config -t table -s table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Any user with the alter table permission can add or remove this constraint.
+This constraint is not applied to bulk imported data, if this a concern then
+disable the bulk import permission.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_pluggable_security">12.5. Pluggable Security</h3>
+<div class="paragraph">
+<p>New in 1.5 of Accumulo is a pluggable security mechanism. It can be broken into three actions&#8201;&#8212;&#8201;authentication, authorization, and permission handling. By default all of these are handled in
+Zookeeper, which is how things were handled in Accumulo 1.4 and before. It is worth noting at this
+point, that it is a new feature in 1.5 and may be adjusted in future releases without the standard
+deprecation cycle.</p>
+</div>
+<div class="paragraph">
+<p>Authentication simply handles the ability for a user to verify their integrity. A combination of
+principal and authentication token are used to verify a user is who they say they are. An
+authentication token should be constructed, either directly through its constructor, but it is
+advised to use the <code>init(Property)</code> method to populate an authentication token. It is expected that a
+user knows what the appropriate token to use for their system is. The default token is
+<code>PasswordToken</code>.</p>
+</div>
+<div class="paragraph">
+<p>Once a user is authenticated by the Authenticator, the user has access to the other actions within
+Accumulo. All actions in Accumulo are ACLed, and this ACL check is handled by the Permission
+Handler. This is what manages all of the permissions, which are divided in system and per table
+level. From there, if a user is doing an action which requires authorizations, the Authorizor is
+queried to determine what authorizations the user has.</p>
+</div>
+<div class="paragraph">
+<p>This setup allows a variety of different mechanisms to be used for handling different aspects of
+Accumulo&#8217;s security. A system like Kerberos can be used for authentication, then a system like LDAP
+could be used to determine if a user has a specific permission, and then it may default back to the
+default ZookeeperAuthorizor to determine what Authorizations a user is ultimately allowed to use.
+This is a pluggable system so custom components can be created depending on your need.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_secure_authorizations_handling">12.6. Secure Authorizations Handling</h3>
+<div class="paragraph">
+<p>For applications serving many users, it is not expected that an Accumulo user
+will be created for each application user. In this case an Accumulo user with
+all authorizations needed by any of the applications users must be created. To
+service queries, the application should create a scanner with the application
+user&#8217;s authorizations. These authorizations could be obtained from a trusted 3rd
+party.</p>
+</div>
+<div class="paragraph">
+<p>Often production systems will integrate with Public-Key Infrastructure (PKI) and
+designate client code within the query layer to negotiate with PKI servers in order
+to authenticate users and retrieve their authorization tokens (credentials). This
+requires users to specify only the information necessary to authenticate themselves
+to the system. Once user identity is established, their credentials can be accessed by
+the client code and passed to Accumulo outside of the reach of the user.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_query_services_layer">12.7. Query Services Layer</h3>
+<div class="paragraph">
+<p>Since the primary method of interaction with Accumulo is through the Java API,
+production environments often call for the implementation of a Query layer. This
+can be done using web services in containers such as Apache Tomcat, but is not a
+requirement. The Query Services Layer provides a mechanism for providing a
+platform on which user facing applications can be built. This allows the application
+designers to isolate potentially complex query logic, and enables a convenient point
+at which to perform essential security functions.</p>
+</div>
+<div class="paragraph">
+<p>Several production environments choose to implement authentication at this layer,
+where users identifiers are used to retrieve their access credentials which are then
+cached within the query layer and presented to Accumulo through the
+Authorizations mechanism.</p>
+</div>
+<div class="paragraph">
+<p>Typically, the query services layer sits between Accumulo and user workstations.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_replication">13. Replication</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_overview">13.1. Overview</h3>
+<div class="paragraph">
+<p>Replication is a feature of Accumulo which provides a mechanism to automatically
+copy data to other systems, typically for the purpose of disaster recovery,
+high availability, or geographic locality. It is best to consider this feature
+as a framework for automatic replication instead of the ability to copy data
+from to another Accumulo instance as copying to another Accumulo cluster is
+only an implementation detail. The local Accumulo cluster is hereby referred
+to as the <code>primary</code> while systems being replicated to are known as
+<code>peers</code>.</p>
+</div>
+<div class="paragraph">
+<p>This replication framework makes two Accumulo instances, where one instance
+replicates to another, eventually consistent between one another, as opposed
+to the strong consistency that each single Accumulo instance still holds. That
+is to say, attempts to read data from a table on a peer which has pending replication
+from the primary will not wait for that data to be replicated before running the scan.
+This is desirable for a number of reasons, the most important is that the replication
+framework is not limited by network outages or offline peers, but only by the HDFS
+space available on the primary system.</p>
+</div>
+<div class="paragraph">
+<p>Replication configurations can be considered as a directed graph which allows cycles.
+The systems in which data was replicated from is maintained in each Mutation which
+allow each system to determine if a peer has already has the data in which
+the system wants to send.</p>
+</div>
+<div class="paragraph">
+<p>Data is replicated by using the Write-Ahead logs (WAL) that each TabletServer is
+already maintaining. TabletServers records which WALs have data that need to be
+replicated to the <code>accumulo.metadata</code> table. The Master uses these records,
+combined with the local Accumulo table that the WAL was used with, to create records
+in the <code>replication</code> table which track which peers the given WAL should be
+replicated to. The Master latter uses these work entries to assign the actual
+replication task to a local TabletServer using ZooKeeper. A TabletServer will get
+a lock in ZooKeeper for the replication of this file to a peer, and proceed to
+replicate to the peer, recording progress in the <code>replication</code> table as
+data is successfully replicated on the peer. Later, the Master and Garbage Collector
+will remove records from the <code>accumulo.metadata</code> and <code>replication</code> tables
+and files from HDFS, respectively, after replication to all peers is complete.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_configuration_2">13.2. Configuration</h3>
+<div class="paragraph">
+<p>Configuration of Accumulo to replicate data to another system can be categorized
+into the following sections.</p>
+</div>
+<div class="sect3">
+<h4 id="_site_configuration">13.2.1. Site Configuration</h4>
+<div class="paragraph">
+<p>Each system involved in replication (even the primary) needs a name that uniquely
+identifies it across all peers in the replication graph. This should be considered
+fixed for an instance, and set in <code>accumulo-site.xml</code>.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;property&gt;
+    &lt;name&gt;replication.name&lt;/name&gt;
+    &lt;value&gt;primary&lt;/value&gt;
+    &lt;description&gt;Unique name for this system used by replication&lt;/description&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_instance_configuration">13.2.2. Instance Configuration</h4>
+<div class="paragraph">
+<p>For each peer of this system, Accumulo needs to know the name of that peer,
+the class used to replicate data to that system and some configuration information
+to connect to this remote peer. In the case of Accumulo, this additional data
+is the Accumulo instance name and ZooKeeper quorum; however, this varies on the
+replication implementation for the peer.</p>
+</div>
+<div class="paragraph">
+<p>These can be set in the site configuration to ease deployments; however, as they may
+change, it can be useful to set this information using the Accumulo shell.</p>
+</div>
+<div class="paragraph">
+<p>To configure a peer with the name <code>peer1</code> which is an Accumulo system with an instance name of <code>accumulo_peer</code>
+and a ZooKeeper quorum of <code>10.0.0.1,10.0.2.1,10.0.3.1</code>, invoke the following
+command in the shell.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@accumulo_primary&gt; config -s
+replication.peer.peer1=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,accumulo_peer,10.0.0.1,10.0.2.1,10.0.3.1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Since this is an Accumulo system, we also want to set a username and password
+to use when authenticating with this peer. On our peer, we make a special user
+which has permission to write to the tables we want to replicate data into, "replication"
+with a password of "password". We then need to record this in the primary&#8217;s configuration.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@accumulo_primary&gt; config -s replication.peer.user.peer1=replication
+root@accumulo_primary&gt; config -s replication.peer.password.peer1=password</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Alternatively, when configuring replication on Accumulo running Kerberos, a keytab
+file per peer can be configured instead of a password. The provided keytabs must be readable
+by the unix user running Accumulo. They keytab for a peer can be unique from the
+keytab used by Accumulo or any keytabs for other peers.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>accumulo@EXAMPLE.COM@accumulo_primary&gt; config -s replication.peer.user.peer1=replication@EXAMPLE.COM
+accumulo@EXAMPLE.COM@accumulo_primary&gt; config -s replication.peer.keytab.peer1=/path/to/replication.keytab</pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_table_configuration_2">13.2.3. Table Configuration</h4>
+<div class="paragraph">
+<p>Now, we presently have a peer defined, so we just need to configure which tables will
+replicate to that peer. We also need to configure an identifier to determine where
+this data will be replicated on the peer. Since we&#8217;re replicating to another Accumulo
+cluster, this is a table ID. In this example, we want to enable replication on
+<code>my_table</code> and configure our peer <code>accumulo_peer</code> as a target, sending
+the data to the table with an ID of <code>2</code> in <code>accumulo_peer</code>.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@accumulo_primary&gt; config -t my_table -s table.replication=true
+root@accumulo_primary&gt; config -t my_table -s table.replication.target.accumulo_peer=2</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To replicate a single table on the primary to multiple peers, the second command
+in the above shell snippet can be issued, for each peer and remote identifier pair.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_monitoring">13.3. Monitoring</h3>
+<div class="paragraph">
+<p>Basic information about replication status from a primary can be found on the Accumulo
+Monitor server, using the <code>Replication</code> link the sidebar.</p>
+</div>
+<div class="paragraph">
+<p>On this page, information is broken down into the following sections:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Files pending replication by peer and target</p>
+</li>
+<li>
+<p>Files queued for replication, with progress made</p>
+</li>
+</ol>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_work_assignment">13.4. Work Assignment</h3>
+<div class="paragraph">
+<p>Depending on the schema of a table, different implementations of the WorkAssigner used could
+be configured. The implementation is controlled via the property <code>replication.work.assigner</code>
+and the full class name for the implementation. This can be configured via the shell or
+<code>accumulo-site.xml</code>.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;property&gt;
+    &lt;name&gt;replication.work.assigner&lt;/name&gt;
+    &lt;value&gt;org.apache.accumulo.master.replication.SequentialWorkAssigner&lt;/value&gt;
+    &lt;description&gt;Implementation used to assign work for replication&lt;/description&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@accumulo_primary&gt; config -t my_table -s replication.work.assigner=org.apache.accumulo.master.replication.SequentialWorkAssigner</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Two implementations are provided. By default, the <code>SequentialWorkAssigner</code> is configured for an
+instance. The SequentialWorkAssigner ensures that, per peer and each remote identifier, each WAL is
+replicated in the order in which they were created. This is sufficient to ensure that updates to a table
+will be replayed in the correct order on the peer. This implementation has the downside of only replicating
+a single WAL at a time.</p>
+</div>
+<div class="paragraph">
+<p>The second implementation, the <code>UnorderedWorkAssigner</code> can be used to overcome the limitation
+of only a single WAL being replicated to a target and peer at any time. Depending on the table schema,
+it&#8217;s possible that multiple versions of the same Key with different values are infrequent or nonexistent.
+In this case, parallel replication to a peer and target is possible without any downsides. In the case
+where this implementation is used were column updates are frequent, it is possible that there will be
+an inconsistency between the primary and the peer.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_replicasystems">13.5. ReplicaSystems</h3>
+<div class="paragraph">
+<p><code>ReplicaSystem</code> is the interface which allows abstraction of replication of data
+to peers of various types. Presently, only an <code>AccumuloReplicaSystem</code> is provided
+which will replicate data to another Accumulo instance. A <code>ReplicaSystem</code> implementation
+is run inside of the TabletServer process, and can be configured as mentioned in the
+<code>Instance Configuration</code> section of this document. Theoretically, an implementation
+of this interface could send data to other filesystems, databases, etc.</p>
+</div>
+<div class="sect3">
+<h4 id="_accumuloreplicasystem">13.5.1. AccumuloReplicaSystem</h4>
+<div class="paragraph">
+<p>The <code>AccumuloReplicaSystem</code> uses Thrift to communicate with a peer Accumulo instance
+and replicate the necessary data. The TabletServer running on the primary will communicate
+with the Master on the peer to request the address of a TabletServer on the peer which
+this TabletServer will use to replicate the data.</p>
+</div>
+<div class="paragraph">
+<p>The TabletServer on the primary will then replicate data in batches of a configurable
+size (<code>replication.max.unit.size</code>). The TabletServer on the peer will report how many
+records were applied back to the primary, which will be used to record how many records
+were successfully replicated. The TabletServer on the primary will continue to replicate
+data in these batches until no more data can be read from the file.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_other_configuration">13.6. Other Configuration</h3>
+<div class="paragraph">
+<p>There are a number of configuration values that can be used to control how
+the implementation of various components operate.</p>
+</div>
+<table class="tableblock frame-all grid-all" style="width: 75%;">
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 40%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-right valign-top">Property</th>
+<th class="tableblock halign-center valign-top">Description</th>
+<th class="tableblock halign-center valign-top">Default</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.max.work.queue</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Maximum number of files queued for replication at one time</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">1000</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.work.assignment.sleep</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Time between invocations of the WorkAssigner</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">30s</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.worker.threads</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Size of threadpool used to replicate data to peers</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">4</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.receipt.service.port</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Thrift service port to listen for replication requests, can use <em>0</em> for a random port</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">10002</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.work.attempts</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Number of attempts to replicate to a peer before aborting the attempt</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">10</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.receiver.min.threads</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Minimum number of idle threads for handling incoming replication</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">1</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.receiver.threadcheck.time</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Time between attempting adjustments of thread pool for incoming replications</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">30s</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.max.unit.size</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Maximum amount of data to be replicated in one RPC</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">64M</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">replication.work.assigner</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Work Assigner implementation</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">org.apache.accumulo.master.replication.SequentialWorkAssigner</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">tserver.replication.batchwriter.replayer.memory</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Size of BatchWriter cache to use in applying replication requests</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">50M</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect2">
+<h3 id="_example_practical_configuration">13.7. Example Practical Configuration</h3>
+<div class="paragraph">
+<p>A real-life example is now provided to give concrete application of replication configuration. This
+example is a two instance Accumulo system, one primary system and one peer system. They are called
+primary and peer, respectively. Each system also have a table of the same name, "my_table". The instance
+name for each is also the same (primary and peer), and both have ZooKeeper hosts on a node with a hostname
+with that name as well (primary:2181 and peer:2181).</p>
+</div>
+<div class="paragraph">
+<p>We want to configure these systems so that "my_table" on "primary" replicates to "my_table" on "peer".</p>
+</div>
+<div class="sect3">
+<h4 id="_conf_accumulo_site_xml">13.7.1. conf/accumulo-site.xml</h4>
+<div class="paragraph">
+<p>We can assign the "unique" name that identifies this Accumulo instance among all others that might participate
+in replication together. In this example, we will use the names provided in the description.</p>
+</div>
+<div class="sect4">
+<h5 id="_primary">Primary</h5>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;replication.name&lt;/name&gt;
+  &lt;value&gt;primary&lt;/value&gt;
+  &lt;description&gt;Defines the unique name&lt;/description&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_peer">Peer</h5>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;replication.name&lt;/name&gt;
+  &lt;value&gt;peer&lt;/value&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_conf_masters_and_conf_slaves">13.7.2. conf/masters and conf/slaves</h4>
+<div class="paragraph">
+<p>Be <strong>sure</strong> to use non-local IP addresses. Other nodes need to connect to it and using localhost will likely result in
+a local node talking to another local node.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_start_both_instances">13.7.3. Start both instances</h4>
+<div class="paragraph">
+<p>The rest of the configuration is dynamic and is best configured on the fly (in ZooKeeper) than in accumulo-site.xml.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_peer_2">13.7.4. Peer</h4>
+<div class="paragraph">
+<p>The next series of command are to be run on the peer system. Create a user account for the primary instance called
+"peer". The password for this account will need to be saved in the configuration on the primary</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@peer&gt; createtable my_table
+root@peer&gt; createuser peer
+root@peer&gt; grant -t my_table -u peer Table.WRITE
+root@peer&gt; grant -t my_table -u peer Table.READ
+root@peer&gt; tables -l</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Remember what the table ID for <em>my_table</em> is. You&#8217;ll need that to configured the primary instance.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_primary_2">13.7.5. Primary</h4>
+<div class="paragraph">
+<p>Next, configure the primary instance.</p>
+</div>
+<div class="sect4">
+<h5 id="_set_up_the_table">Set up the table</h5>
+<div class="listingblock">
+<div class="content">
+<pre>root@primary&gt; createtable my_table</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_define_the_peer_as_a_replication_peer_to_the_primary">Define the Peer as a replication peer to the Primary</h5>
+<div class="paragraph">
+<p>We&#8217;re defining the instance with replication.name of <em>peer</em> as a peer. We provide the implementation of ReplicaSystem
+that we want to use, and the configuration for the AccumuloReplicaSystem. In this case, the configuration is the Accumulo
+Instance name for <em>peer</em> and the ZooKeeper quorum string. The configuration key is of the form
+"replication.peer.$peer_name".</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@primary&gt; config -s replication.peer.peer=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,peer,$peer_zk_quorum</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_set_the_authentication_credentials">Set the authentication credentials</h5>
+<div class="paragraph">
+<p>We want to use that special username and password that we created on the peer, so we have a means to write data to
+the table that we want to replicate to. The configuration key is of the form "replication.peer.user.$peer_name".</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@primary&gt; config -s replication.peer.user.peer=peer
+root@primary&gt; config -s replication.peer.password.peer=peer</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_enable_replication_on_the_table">Enable replication on the table</h5>
+<div class="paragraph">
+<p>Now that we have defined the peer on the primary and provided the authentication credentials, we need to configure
+our table with the implementation of ReplicaSystem we want to use to replicate to the peer. In this case, our peer
+is an Accumulo instance, so we want to use the AccumuloReplicaSystem.</p>
+</div>
+<div class="paragraph">
+<p>The configuration for the AccumuloReplicaSystem is the table ID for the table on the peer instance that we
+want to replicate into. Be sure to use the correct value for $peer_table_id. The configuration key is of
+the form "table.replication.target.$peer_name".</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@primary&gt; config -t my_table -s table.replication.target.peer=$peer_table_id</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Finally, we can enable replication on this table.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@primary&gt; config -t my_table -s table.replication=true</pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_extra_considerations_for_use">13.8. Extra considerations for use</h3>
+<div class="paragraph">
+<p>While this feature is intended for general-purpose use, its implementation does carry some baggage. Like any software,
+replication is a feature that operates well within some set of use cases but is not meant to support all use cases.
+For the benefit of the users, we can enumerate these cases.</p>
+</div>
+<div class="sect3">
+<h4 id="_latency">13.8.1. Latency</h4>
+<div class="paragraph">
+<p>As previously mentioned, the replication feature uses the Write-Ahead Log files for a number of reasons, one of which
+is to prevent the need for data to be written to RFiles before it is available to be replicated. While this can help
+reduce the latency for a batch of Mutations that have been written to Accumulo, the latency is at least seconds to tens
+of seconds for replication once ingest is active. For a table which replication has just been enabled on, this is likely
+to take a few minutes before replication will begin.</p>
+</div>
+<div class="paragraph">
+<p>Once ingest is active and flowing into the system at a regular rate, replication should be occurring at a similar rate,
+given sufficient computing resources. Replication attempts to copy data at a rate that is to be considered low latency
+but is not a replacement for custom indexing code which can ensure near real-time referential integrity on secondary indexes.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_table_configured_iterators">13.8.2. Table-Configured Iterators</h4>
+<div class="paragraph">
+<p>Accumulo Iterators tend to be a heavy hammer which can be used to solve a variety of problems. In general, it is highly
+recommended that Iterators which are applied at major compaction time are both idempotent and associative due to the
+non-determinism in which some set of files for a Tablet might be compacted. In practice, this translates to common patterns,
+such as aggregation, which are implemented in a manner resilient to duplication (such as using a Set instead of a List).</p>
+</div>
+<div class="paragraph">
+<p>Due to the asynchronous nature of replication and the expectation that hardware failures and network partitions will exist,
+it is generally not recommended to not configure replication on a table which has Iterators set which are not idempotent.
+While the replication implementation can make some simple assertions to try to avoid re-replication of data, it is not
+presently guaranteed that all data will only be sent to a peer once. Data will be replicated at least once. Typically,
+this is not a problem as the VersioningIterator will automaticaly deduplicate this over-replication because they will
+have the same timestamp; however, certain Combiners may result in inaccurate aggregations.</p>
+</div>
+<div class="paragraph">
+<p>As a concrete example, consider a table which has the SummingCombiner configured to sum all values for
+multiple versions of the same Key. For some key, consider a set of numeric values that are written to a table on the
+primary: [1, 2, 3]. On the primary, all of these are successfully written and thus the current value for the given key
+would be 6, (1 + 2 + 3). Consider, however, that each of these updates to the peer were done independently (because
+other data was also included in the write-ahead log that needed to be replicated). The update with a value of 1 was
+successfully replicated, and then we attempted to replicate the update with a value of 2 but the remote server never
+responded. The primary does not know whether the update with a value of 2 was actually applied or not, so the
+only recourse is to re-send the update. After we receive confirmation that the update with a value of 2 was replicated,
+we will then replicate the update with 3. If the peer did never apply the first update of <em>2</em>, the summation is accurate.
+If the update was applied but the acknowledgement was lost for some reason (system failure, network partition), the
+update will be resent to the peer. Because addition is non-idempotent, we have created an inconsistency between the
+primary and peer. As such, the SummingCombiner wouldn&#8217;t be recommended on a table being replicated.</p>
+</div>
+<div class="paragraph">
+<p>While there are changes that could be made to the replication implementation which could attempt to mitigate this risk,
+presently, it is not recommended to configure Iterators or Combiners which are not idempotent to support cases where
+inaccuracy of aggregations is not acceptable.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_duplicate_keys">13.8.3. Duplicate Keys</h4>
+<div class="paragraph">
+<p>In Accumulo, when more than one key exists that are exactly the same, keys that are equal down to the timestamp,
+the retained value is non-deterministic. Replication introduces another level of non-determinism in this case.
+For a table that is being replicated and has multiple equal keys with different values inserted into it, the final
+value in that table on the primary instance is not guaranteed to be the final value on all replicas.</p>
+</div>
+<div class="paragraph">
+<p>For example, say the values that were inserted on the primary instance were <code>value1</code> and <code>value2</code> and the final
+value was <code>value1</code>, it is not guaranteed that all replicas will have <code>value1</code> like the primary. The final value is
+non-deterministic for each instance.</p>
+</div>
+<div class="paragraph">
+<p>As is the recommendation without replication enabled, if multiple values for the same key (sans timestamp) are written to
+Accumulo, it is strongly recommended that the value in the timestamp properly reflects the intended version by
+the client. That is to say, newer values inserted into the table should have larger timestamps. If the time between
+writing updates to the same key is significant (order minutes), this concern can likely be ignored.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_bulk_imports">13.8.4. Bulk Imports</h4>
+<div class="paragraph">
+<p>Currently, files that are bulk imported into a table configured for replication are not replicated. There is no
+technical reason why it was not implemented, it was simply omitted from the initial implementation. This is considered a
+fair limitation because bulk importing generated files multiple locations is much simpler than bifurcating "live" ingest
+data into two instances. Given some existing bulk import process which creates files and them imports them into an
+Accumulo instance, it is trivial to copy those files to a new HDFS instance and import them into another Accumulo
+instance using the same process. Hadoop&#8217;s <code>distcp</code> command provides an easy way to copy large amounts of data to another
+HDFS instance which makes the problem of duplicating bulk imports very easy to solve.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_table_schema">13.9. Table Schema</h3>
+<div class="paragraph">
+<p>The following describes the kinds of keys, their format, and their general function for the purposes of individuals
+understanding what the replication table describes. Because the replication table is essentially a state machine,
+this data is often the source of truth for why Accumulo is doing what it is with respect to replication. There are
+three "sections" in this table: "repl", "work", and "order".</p>
+</div>
+<div class="sect3">
+<h4 id="_repl_section">13.9.1. Repl section</h4>
+<div class="paragraph">
+<p>This section is for the tracking of a WAL file that needs to be replicated to one or more Accumulo remote tables.
+This entry is tracking that replication needs to happen on the given WAL file, but also that the local Accumulo table,
+as specified by the column qualifier "local table ID", has information in this WAL file.</p>
+</div>
+<div class="paragraph">
+<p>The structure of the key-value is as follows:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;HDFS_uri_to_WAL&gt; repl:&lt;local_table_id&gt; [] -&gt; &lt;protobuf&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This entry is created based on a replication entry from the Accumlo metadata table, and is deleted from the replication table
+when the WAL has been fully replicated to all remote Accumulo tables.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_work_section">13.9.2. Work section</h4>
+<div class="paragraph">
+<p>This section is for the tracking of a WAL file that needs to be replicated to a single Accumulo table in a remote
+Accumulo cluster. If a WAL must be replicated to multiple tables, there will be multiple entries. The Value for this
+Key is a serialized ProtocolBuffer message which encapsulates the portion of the WAL which was already sent for
+this file. The "replication target" is the unique location of where the file needs to be replicated: the identifier
+for the remote Accumulo cluster and the table ID in that remote Accumulo cluster. The protocol buffer in the value
+tracks the progress of replication to the remote cluster.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;HDFS_uri_to_WAL&gt; work:&lt;replication_target&gt; [] -&gt; &lt;protobuf&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The "work" entry is created when a WAL has an "order" entry, and deleted after the WAL is replicated to all
+necessary remote clusters.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_order_section">13.9.3. Order section</h4>
+<div class="paragraph">
+<p>This section is used to order and schedule (create) replication work. In some cases, data with the same timestamp
+may be provided multiple times. In this case, it is important that WALs are replicated in the same order they were
+created/used. In this case (and in cases where this is not important), the order entry ensures that oldest WALs
+are processed most quickly and pushed through the replication framework.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;time_of_WAL_closing&gt;\x00&lt;HDFS_uri_to_WAL&gt; order:&lt;local_table_id&gt; [] -&gt; &lt;protobuf&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The "order" entry is created when the WAL is closed (no longer being written to) and is removed when
+the WAL is fully replicated to all remote locations.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_implementation_details">14. Implementation Details</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_fault_tolerant_executor_fate">14.1. Fault-Tolerant Executor (FATE)</h3>
+<div class="paragraph">
+<p>Accumulo must implement a number of distributed, multi-step operations to support
+the client API. Creating a new table is a simple example of an atomic client call
+which requires multiple steps in the implementation: get a unique table ID, configure
+default table permissions, populate information in ZooKeeper to record the table&#8217;s
+existence, create directories in HDFS for the table&#8217;s data, etc. Implementing these
+steps in a way that is tolerant to node failure and other concurrent operations is
+very difficult to achieve. Accumulo includes a Fault-Tolerant Executor (FATE) which
+is widely used server-side to implement the client API safely and correctly.</p>
+</div>
+<div class="paragraph">
+<p>FATE is the implementation detail which ensures that tables in creation when the
+Master dies will be successfully created when another Master process is started.
+This alleviates the need for any external tools to correct some bad state&#8201;&#8212;&#8201;Accumulo can
+undo the failure and self-heal without any external intervention.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_overview_2">14.2. Overview</h3>
+<div class="paragraph">
+<p>FATE consists of two primary components: a repeatable, persisted operation (REPO), a storage
+layer for REPOs and an execution system to run REPOs. Accumulo uses ZooKeeper as the storage
+layer for FATE and the Accumulo Master acts as the execution system to run REPOs.</p>
+</div>
+<div class="paragraph">
+<p>The important characteristic of REPOs are that they implemented in a way that is idempotent:
+every operation must be able to undo or replay a partial execution of itself. Requiring the
+implementation of the operation to support this functional greatly simplifies the execution
+of these operations. This property is also what guarantees safety in light of failure conditions.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_administration">14.3. Administration</h3>
+<div class="paragraph">
+<p>Sometimes, it is useful to inspect the current FATE operations, both pending and executing.
+For example, a command that is not completing could be blocked on the execution of another
+operation. Accumulo provides an Accumulo shell command to interact with fate.</p>
+</div>
+<div class="paragraph">
+<p>The <code>fate</code> shell command accepts a number of arguments for different functionality:
+<code>list</code>/<code>print</code>, <code>fail</code>, <code>delete</code>, <code>dump</code>.</p>
+</div>
+<div class="sect3">
+<h4 id="_list_print">14.3.1. List/Print</h4>
+<div class="paragraph">
+<p>Without any additional arguments, this command will print all operations that still exist in
+the FATE store (ZooKeeper). This will include active, pending, and completed operations (completed
+operations are lazily removed from the store). Each operation includes a unique "transaction ID", the
+state of the operation (e.g. <code>NEW</code>, <code>IN_PROGRESS</code>, <code>FAILED</code>), any locks the
+transaction actively holds and any locks it is waiting to acquire.</p>
+</div>
+<div class="paragraph">
+<p>This option can also accept transaction IDs which will restrict the list of transactions shown.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_fail">14.3.2. Fail</h4>
+<div class="paragraph">
+<p>This command can be used to manually fail a FATE transaction and requires a transaction ID
+as an argument. Failing an operation is not a normal procedure and should only be performed
+by an administrator who understands the implications of why they are failing the operation.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_delete">14.3.3. Delete</h4>
+<div class="paragraph">
+<p>This command requires a transaction ID and will delete any locks that the transaction
+holds. Like the fail command, this command should only be used in extreme circumstances
+by an administrator that understands the implications of the command they are about to
+invoke. It is not normal to invoke this command.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_dump">14.3.4. Dump</h4>
+<div class="paragraph">
+<p>This command accepts zero more transaction IDs.  If given no transaction IDs,
+it will dump all active transactions.  A FATE operations is compromised as a
+sequence of REPOs.  In order to start a FATE transaction, a REPO is pushed onto
+a per transaction REPO stack.  The top of the stack always contains the next
+REPO the FATE transaction should execute.  When a REPO is successful it may
+return another REPO which is pushed on the stack.  The <code>dump</code> command will
+print all of the REPOs on each transactions stack.  The REPOs are serialized to
+JSON in order to make them human readable.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_ssl">15. SSL</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accumulo, through Thrift&#8217;s TSSLTransport, provides the ability to encrypt
+wire communication between Accumulo servers and clients using secure
+sockets layer (SSL). SSL certifcates signed by the same certificate authority
+control the "circle of trust" in which a secure connection can be established.
+Typically, each host running Accumulo processes would be given a certificate
+which identifies itself.</p>
+</div>
+<div class="paragraph">
+<p>Clients can optionally also be given a certificate, when client-auth is enabled,
+which prevents unwanted clients from accessing the system. The SSL integration
+presently provides no authentication support within Accumulo (an Accumulo username
+and password are still required) and is only used to establish a means for
+secure communication.</p>
+</div>
+<div class="sect2">
+<h3 id="_server_configuration">15.1. Server configuration</h3>
+<div class="paragraph">
+<p>As previously mentioned, the circle of trust is established by the certificate
+authority which created the certificates in use. Because of the tight coupling
+of certificate generation with an organization&#8217;s policies, Accumulo does not
+provide a method in which to automatically create the necessary SSL components.</p>
+</div>
+<div class="paragraph">
+<p>Administrators without existing infrastructure built on SSL are encourage to
+use OpenSSL and the <code>keytool</code> command. An example of these commands are
+included in a section below. Accumulo servers require a certificate and keystore,
+in the form of Java KeyStores, to enable SSL. The following configuration assumes
+these files already exist.</p>
+</div>
+<div class="paragraph">
+<p>In <code>$ACCUMULO_CONF_DIR/accumulo-site.xml</code>, the following properties are required:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>rpc.javax.net.ssl.keyStore</strong>=<em>The path on the local filesystem to the keystore containing the server&#8217;s certificate</em></p>
+</li>
+<li>
+<p><strong>rpc.javax.net.ssl.keyStorePassword</strong>=<em>The password for the keystore containing the server&#8217;s certificate</em></p>
+</li>
+<li>
+<p><strong>rpc.javax.net.ssl.trustStore</strong>=<em>The path on the local filesystem to the keystore containing the certificate authority&#8217;s public key</em></p>
+</li>
+<li>
+<p><strong>rpc.javax.net.ssl.trustStorePassword</strong>=<em>The password for the keystore containing the certificate authority&#8217;s public key</em></p>
+</li>
+<li>
+<p><strong>instance.rpc.ssl.enabled</strong>=<em>true</em></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Optionally, SSL client-authentication (two-way SSL) can also be enabled by setting
+<code>instance.rpc.ssl.clientAuth=true</code> in <code>$ACCUMULO_CONF_DIR/accumulo-site.xml</code>.
+This requires that each client has access to  valid certificate to set up a secure connection
+to the servers. By default, Accumulo uses one-way SSL which does not require clients to have
+their own certificate.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_client_configuration">15.2. Client configuration</h3>
+<div class="paragraph">
+<p>To establish a connection to Accumulo servers, each client must also have
+special configuration. This is typically accomplished through the use of
+the client configuration file whose default location is <code>~/.accumulo/config</code>.</p>
+</div>
+<div class="paragraph">
+<p>The following properties must be set to connect to an Accumulo instance using SSL:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>rpc.javax.net.ssl.trustStore</strong>=<em>The path on the local filesystem to the keystore containing the certificate authority&#8217;s public key</em></p>
+</li>
+<li>
+<p><strong>rpc.javax.net.ssl.trustStorePassword</strong>=<em>The password for the keystore containing the certificate authority&#8217;s public key</em></p>
+</li>
+<li>
+<p><strong>instance.rpc.ssl.enabled</strong>=<em>true</em></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If two-way SSL if enabled (<code>instance.rpc.ssl.clientAuth=true</code>) for the instance, the client must also define
+their own certificate and enable client authenticate as well.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>rpc.javax.net.ssl.keyStore</strong>=<em>The path on the local filesystem to the keystore containing the server&#8217;s certificate</em></p>
+</li>
+<li>
+<p><strong>rpc.javax.net.ssl.keyStorePassword</strong>=<em>The password for the keystore containing the server&#8217;s certificate</em></p>
+</li>
+<li>
+<p><strong>instance.rpc.ssl.clientAuth</strong>=<em>true</em></p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_generating_ssl_material_using_openssl">15.3. Generating SSL material using OpenSSL</h3>
+<div class="paragraph">
+<p>The following is included as an example for generating your own SSL material (certificate authority and server/client
+certificates) using OpenSSL and Java&#8217;s KeyTool command.</p>
+</div>
+<div class="sect3">
+<h4 id="_generate_a_certificate_authority">15.3.1. Generate a certificate authority</h4>
+<div class="listingblock">
+<div class="content">
+<pre># Create a private key
+openssl genrsa -des3 -out root.key 4096
+
+# Create a certificate request using the private key
+openssl req -x509 -new -key root.key -days 365 -out root.pem
+
+# Generate a Base64-encoded version of the PEM just created
+openssl x509 -outform der -in root.pem -out root.der
+
+# Import the key into a Java KeyStore
+keytool -import -alias root-key -keystore truststore.jks -file root.der
+
+# Remove the DER formatted key file (as we don't need it anymore)
+rm root.der</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The <code>truststore.jks</code> file is the Java keystore which contains the certificate authority&#8217;s public key.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_generate_a_certificate_keystore_per_host">15.3.2. Generate a certificate/keystore per host</h4>
+<div class="paragraph">
+<p>It&#8217;s common that each host in the instance is issued its own certificate (notably to ensure that revocation procedures
+can be easily followed). The following steps can be taken for each host.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre># Create the private key for our server
+openssl genrsa -out server.key 4096
+
+# Generate a certificate signing request (CSR) with our private key
+openssl req -new -key server.key -out server.csr
+
+# Use the CSR and the CA to create a certificate for the server (a reply to the CSR)
+openssl x509 -req -in server.csr -CA root.pem -CAkey root.key -CAcreateserial \
+    -out server.crt -days 365
+
+# Use the certificate and the private key for our server to create PKCS12 file
+openssl pkcs12 -export -in server.crt -inkey server.key -certfile server.crt \
+    -name 'server-key' -out server.p12
+
+# Create a Java KeyStore for the server using the PKCS12 file (private key)
+keytool -importkeystore -srckeystore server.p12 -srcstoretype pkcs12 -destkeystore \
+    server.jks -deststoretype JKS
+
+# Remove the PKCS12 file as we don't need it
+rm server.p12
+
+# Import the CA-signed certificate to the keystore
+keytool -import -trustcacerts -alias server-crt -file server.crt -keystore server.jks</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The <code>server.jks</code> file is the Java keystore containing the certificate for a given host. The above
+methods are equivalent whether the certficate is generate for an Accumulo server or a client.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_kerberos">16. Kerberos</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_overview_3">16.1. Overview</h3>
+<div class="paragraph">
+<p>Kerberos is a network authentication protocol that provides a secure way for
+peers to prove their identity over an unsecure network in a client-server model.
+A centralized key-distribution center (KDC) is the service that coordinates
+authentication between a client and a server. Clients and servers use "tickets",
+obtained from the KDC via a password or a special file called a "keytab", to
+communicate with the KDC and prove their identity. A KDC administrator must
+create the principal (name for the client/server identiy) and the password
+or keytab, securely passing the necessary information to the actual user/service.
+Properly securing the KDC and generated ticket material is central to the security
+model and is mentioned only as a warning to administrators running their own KDC.</p>
+</div>
+<div class="paragraph">
+<p>To interact with Kerberos programmatically, GSSAPI and SASL are two standards
+which allow cross-language integration with Kerberos for authentication. GSSAPI,
+the generic security service application program interface, is a standard which
+Kerberos implements. In the Java programming language, the language itself also implements
+GSSAPI which is leveraged by other applications, like Apache Hadoop and Apache Thrift.
+SASL, simple authentication and security layer, is a framework for authentication and
+and security over the network. SASL provides a number of mechanisms for authentication,
+one of which is GSSAPI. Thus, SASL provides the transport which authenticates
+using GSSAPI that Kerberos implements.</p>
+</div>
+<div class="paragraph">
+<p>Kerberos is a very complicated software application and is deserving of much
+more description than can be provided here. An <a href="http://www.roguelynn.com/words/explain-like-im-5-kerberos/">explain like
+I&#8217;m 5</a> blog post is very good at distilling the basics, while <a href="http://web.mit.edu/kerberos/">MIT Kerberos&#8217;s project page</a>
+contains lots of documentation for users or administrators. Various Hadoop "vendors"
+also provide free documentation that includes step-by-step instructions for
+configuring Hadoop and ZooKeeper (which will be henceforth considered as prerequisites).</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_within_hadoop">16.2. Within Hadoop</h3>
+<div class="paragraph">
+<p>Out of the box, HDFS and YARN have no ability to enforce that a user is who
+they claim they are. Thus, any basic Hadoop installation should be treated as
+unsecure: any user with access to the cluster has the ability to access any data.
+Using Kerberos to provide authentication, users can be strongly identified, delegating
+to Kerberos to determine who a user is and enforce that a user is who they claim to be.
+As such, Kerberos is widely used across the entire Hadoop ecosystem for strong
+authentication. Since server processes accessing HDFS or YARN are required
+to use Kerberos to authenticate with HDFS, it makes sense that they also require
+Kerberos authentication from their clients, in addition to other features provided
+by SASL.</p>
+</div>
+<div class="paragraph">
+<p>A typical deployment involves the creation of Kerberos principals for all server
+processes (Hadoop datanodes and namenode(s), ZooKeepers), the creation of a keytab
+file for each principal and then proper configuration for the Hadoop site xml files.
+Users also need Kerberos principals created for them; however, a user typically
+uses a password to identify themselves instead of a keytab. Users can obtain a
+ticket granting ticket (TGT) from the KDC using their password which allows them
+to authenticate for the lifetime of the TGT (typically one day by default) and alleviates
+the need for further password authentication.</p>
+</div>
+<div class="paragraph">
+<p>For client server applications, like web servers, a keytab can be created which
+allow for fully-automated Kerberos identification removing the need to enter any
+password, at the cost of needing to protect the keytab file. These principals
+will apply directly to authentication for clients accessing Accumulo and the
+Accumulo processes accessing HDFS.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_delegation_tokens">16.3. Delegation Tokens</h3>
+<div class="paragraph">
+<p>MapReduce, a common way that clients interact with Accumulo, does not map well to the
+client-server model that Kerberos was originally designed to support. Specifically, the parallelization
+of tasks across many nodes introduces the problem of securely sharing the user credentials across
+these tasks in as safe a manner as possible. To address this problem, Hadoop introduced the notion
+of a delegation token to be used in distributed execution settings.</p>
+</div>
+<div class="paragraph">
+<p>A delegation token is nothing more than a short-term, on-the-fly password generated after authenticating with the user&#8217;s
+credentials.  In Hadoop itself, the Namenode and ResourceManager, for HDFS and YARN respectively, act as the gateway for
+delegation tokens requests. For example, before a YARN job is submitted, the implementation will request delegation
+tokens from the NameNode and ResourceManager so the YARN tasks can communicate with HDFS and YARN. In the same manner,
+support has been added in the Accumulo Master to generate delegation tokens to enable interaction with Accumulo via
+MapReduce when Kerberos authentication is enabled in a manner similar to HDFS and YARN.</p>
+</div>
+<div class="paragraph">
+<p>Generating an expiring password is, arguably, more secure than distributing the user&#8217;s
+credentials across the cluster as only access to HDFS, YARN and Accumulo would be
+compromised in the case of the token being compromised as opposed to the entire
+Kerberos credential. Additional details for clients and servers will be covered
+in subsequent sections.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_configuring_accumulo">16.4. Configuring Accumulo</h3>
+<div class="paragraph">
+<p>To configure Accumulo for use with Kerberos, both client-facing and server-facing
+changes must be made for a functional system on secured Hadoop. As previously mentioned,
+numerous guidelines already exist on the subject of configuring Hadoop and ZooKeeper for
+use with Kerberos and won&#8217;t be covered here. It is assumed that you have functional
+Hadoop and ZooKeeper already installed.</p>
+</div>
+<div class="paragraph">
+<p>Note that on an existing cluster the server side changes will require a full cluster shutdown and restart. You should
+wait to restart the TraceServers until after you&#8217;ve completed the rest of the cluster set up and provisioned
+a trace user with appropriate permissions.</p>
+</div>
+<div class="sect3">
+<h4 id="_servers">16.4.1. Servers</h4>
+<div class="paragraph">
+<p>The first step is to obtain a Kerberos identity for the Accumulo server processes.
+When running Accumulo with Kerberos enabled, a valid Kerberos identity will be required
+to initiate any RPC between Accumulo processes (e.g. Master and TabletServer) in addition
+to any HDFS action (e.g. client to HDFS or TabletServer to HDFS).</p>
+</div>
+<div class="sect4">
+<h5 id="_generate_principal_and_keytab">Generate Principal and Keytab</h5>
+<div class="paragraph">
+<p>In the <code>kadmin.local</code> shell or using the <code>-q</code> option on <code>kadmin.local</code>, create a
+principal for Accumulo for all hosts that are running Accumulo processes. A Kerberos
+principal is of the form "primary/instance@REALM". "accumulo" is commonly the "primary"
+(although not required) and the "instance" is the fully-qualified domain name for
+the host that will be running the Accumulo process&#8201;&#8212;&#8201;this is required.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>kadmin.local -q "addprinc -randkey accumulo/host.domain.com"</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Perform the above for each node running Accumulo processes in the instance, modifying
+"host.domain.com" for your network. The <code>randkey</code> option generates a random password
+because we will use a keytab for authentication, not a password, since the Accumulo
+server processes don&#8217;t have an interactive console to enter a password into.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>kadmin.local -q "xst -k accumulo.hostname.keytab accumulo/host.domain.com"</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To simplify deployments, at thet cost of security, all Accumulo principals could
+be globbed into a single keytab</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>kadmin.local -q "xst -k accumulo.service.keytab -glob accumulo*"</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To ensure that the SASL handshake can occur from clients to servers and servers to servers,
+all Accumulo servers must share the same instance and realm principal components as the
+"client" needs to know these to set up the connection with the "server".</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_server_configuration_2">Server Configuration</h5>
+<div class="paragraph">
+<p>A number of properties need to be changed to account to properly configure servers
+in <code>accumulo-site.xml</code>.</p>
+</div>
+<table class="tableblock frame-all grid-all spread">
+<colgroup>
+<col style="width: 33.3333%;">
+<col style="width: 33.3333%;">
+<col style="width: 33.3334%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Key</th>
+<th class="tableblock halign-left valign-top">Default Value</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">general.kerberos.keytab</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">/etc/security/keytabs/accumulo.service.keytab</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The path to the keytab for Accumulo on local filesystem. Change the value to the actual path on your system.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">general.kerberos.principal</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">accumulo/_HOST@REALM</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The Kerberos principal for Accumulo, needs to match the keytab. "_HOST" can be used instead of the actual hostname in the principal and will be automatically expanded to the current FQDN which reduces the configuration file burden.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">instance.rpc.sasl.enabled</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">true</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Enables SASL for the Thrift Servers (supports GSSAPI)</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">rpc.sasl.qop</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">auth</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">One of "auth", "auth-int", or "auth-conf". These map to the SASL defined properties for
+quality of protection. "auth" is authentication only. "auth-int" is authentication and data
+integrity. "auth-conf" is authentication, data integrity and confidentiality.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">instance.security.authenticator</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">org.apache.accumulo.server.security.
+handler.KerberosAuthenticator</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Configures Accumulo to use the Kerberos principal as the Accumulo username/principal</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">instance.security.authorizor</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">org.apache.accumulo.server.security.
+handler.KerberosAuthorizor</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Configures Accumulo to use the Kerberos principal for authorization purposes</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">instance.security.permissionHandler</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">org.apache.accumulo.server.security.
+handler.KerberosPermissionHandler</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Configures Accumulo to use the Kerberos principal for permission purposes</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">trace.token.type</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">org.apache.accumulo.core.client.
+security.tokens.KerberosToken</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Configures the Accumulo Tracer to use the KerberosToken for authentication when serializing traces to the trace table.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">trace.user</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">accumulo/_HOST@REALM</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The tracer process needs valid credentials to serialize traces to Accumulo. While the other server processes are
+creating a SystemToken from the provided keytab and principal, we can still use a normal KerberosToken and the same
+keytab/principal to serialize traces. Like non-Kerberized instances, the table must be created and permissions granted
+to the trace.user. The same <code>_HOST</code> replacement is performed on this value, substituted the FQDN for <code>_HOST</code>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">trace.token.property.keytab</p></td>
+<td class="tableblock halign-left valign-top"></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">You can optionally specify the path to a keytab file for the principal given in the <code>trace.user</code> property. If you don&#8217;t
+set this path, it will default to the value given in <code>general.kerberos.principal</code>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">general.delegation.token.lifetime</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">7d</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The length of time that the server-side secret used to create delegation tokens is valid. After a server-side secret
+expires, a delegation token created with that secret is no longer valid.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">general.delegation.token.update.interval</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1d</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The frequency in which new server-side secrets should be generated to create delegation tokens for clients. Generating
+new secrets reduces the likelihood of cryptographic attacks.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Although it should be a prerequisite, it is ever important that you have DNS properly
+configured for your nodes and that Accumulo is configured to use the FQDN. It
+is extremely important to use the FQDN in each of the "hosts" files for each
+Accumulo process: <code>masters</code>, <code>monitors</code>, <code>slaves</code>, <code>tracers</code>, and <code>gc</code>.</p>
+</div>
+<div class="paragraph">
+<p>Normally, no changes are needed in <code>accumulo-env.sh</code> to enable Kerberos. Typically, the <code>krb5.conf</code>
+is installed on the local machine in <code>/etc/</code>, and the Java library implementations will look
+here to find the necessary configuration to communicate with the KDC. Some installations
+may require a different <code>krb5.conf</code> to be used for Accumulo: <code>ACCUMULO_KRB5_CONF</code> enables this.</p>
+</div>
+<div class="paragraph">
+<p><code>ACCUMULO_KRB5_CONF</code> can be configured to a directory containing a file named <code>krb5.conf</code> or
+the path to the file itself. This will be provided to all Accumulo server and client processes
+via the JVM system property <code>java.security.krb5.conf</code>. If the environment variable is not set,
+<code>java.security.krb5.conf</code> will not be set either.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_kerberosauthenticator">KerberosAuthenticator</h5>
+<div class="paragraph">
+<p>The <code>KerberosAuthenticator</code> is an implementation of the pluggable security interfaces
+that Accumulo provides. It builds on top of what the default ZooKeeper-based implementation,
+but removes the need to create user accounts with passwords in Accumulo for clients. As
+long as a client has a valid Kerberos identity, they can connect to and interact with
+Accumulo, but without any permissions (e.g. cannot create tables or write data). Leveraging
+ZooKeeper removes the need to change the permission handler and authorizor, so other Accumulo
+functions regarding permissions and cell-level authorizations do not change.</p>
+</div>
+<div class="paragraph">
+<p>It is extremely important to note that, while user operations like <code>SecurityOperations.listLocalUsers()</code>,
+<code>SecurityOperations.dropLocalUser()</code>, and <code>SecurityOperations.createLocalUser()</code> will not return
+errors, these methods are not equivalent to normal installations, as they will only operate on
+users which have, at one point in time, authenticated with Accumulo using their Kerberos identity.
+The KDC is still the authoritative entity for user management. The previously mentioned methods
+are provided as they simplify management of users within Accumulo, especially with respect
+to granting Authorizations and Permissions to new users.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_administrative_user">Administrative User</h5>
+<div class="paragraph">
+<p>Out of the box (without Kerberos enabled), Accumulo has a single user with administrative permissions "root".
+This users is used to "bootstrap" other users, creating less-privileged users for applications using
+the system. In Kerberos, to authenticate with the system, it&#8217;s required that the client presents Kerberos
+credentials for the principal (user) the client is trying to authenticate as.</p>
+</div>
+<div class="paragraph">
+<p>Because of this, an administrative user named "root" would be useless in an instance using Kerberos,
+because it is very unlikely to have Kerberos credentials for a principal named <code>root</code>. When Kerberos is
+enabled, Accumulo will prompt for the name of a user to grant the same permissions as what the <code>root</code>
+user would normally have. The name of the Accumulo user to grant administrative permissions to can
+also be given by the <code>-u</code> or <code>--user</code> options.</p>
+</div>
+<div class="paragraph">
+<p>If you are enabling Kerberos on an existing cluster, you will need to reinitialize the security system in
+order to replace the existing "root" user with one that can be used with Kerberos. These steps should be
+completed after you have done the previously described configuration changes and will require access to
+a complete <code>accumulo-site.xml</code>, including the instance secret. Note that this process will delete all
+existing users in the system; you will need to reassign user permissions based on Kerberos principals.</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Ensure Accumulo is not running.</p>
+</li>
+<li>
+<p>Given the path to a <code>accumulo-site.xml</code> with the instance secret, run the security reset tool. If you are
+prompted for a password you can just hit return, since it won&#8217;t be used.</p>
+</li>
+<li>
+<p>Start the Accumulo cluster</p>
+</li>
+</ol>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ${ACCUMULO_HOME}/bin/stop-all.sh
+...
+$ ACCUMULO_CONF_DIR=/path/to/server/conf/ accumulo init --reset-security
+Running against secured HDFS
+Principal (user) to grant administrative privileges to : acculumo_admin@EXAMPLE.COM
+Enter initial password for accumulo_admin@EXAMPLE.COM (this may not be applicable for your security setup):
+Confirm initial password for accumulo_admin@EXAMPLE.COM:
+$ ${ACCUMULO_HOME}/bin/start-all.sh
+...
+$</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_verifying_secure_access">Verifying secure access</h5>
+<div class="paragraph">
+<p>To verify that servers have correctly started with Kerberos enabled, ensure that the processes
+are actually running (they should exit immediately if login fails) and verify that you see
+something similar to the following in the application log.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>2015-01-07 11:57:56,826 [security.SecurityUtil] INFO : Attempting to login with keytab as accumulo/hostname@EXAMPLE.COM
+2015-01-07 11:57:56,830 [security.UserGroupInformation] INFO : Login successful for user accumulo/hostname@EXAMPLE.COM using keytab file /etc/security/keytabs/accumulo.service.keytab</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_impersonation">Impersonation</h5>
+<div class="paragraph">
+<p>Impersonation is functionality which allows a certain user to act as another. One direct application
+of this concept within Accumulo is the Thrift proxy. The Thrift proxy is configured to accept
+user requests and pass them onto Accumulo, enabling client access to Accumulo via any thrift-compatible
+language. When the proxy is running with SASL transports, this enforces that clients present a valid
+Kerberos identity to make a connection. In this situation, the Thrift proxy server does not have
+access to the secret key material in order to make a secure connection to Accumulo as the client,
+it can only connect to Accumulo as itself. Impersonation, in this context, refers to the ability
+of the proxy to authenticate to Accumulo as itself, but act on behalf of an Accumulo user.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo supports basic impersonation of end-users by a third party via static rules in Accumulo&#8217;s
+site configuration file. These two properties are semi-colon separated properties which are aligned
+by index. This first element in the user impersonation property value matches the first element
+in the host impersonation property value, etc.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;instance.rpc.sasl.allowed.user.impersonation&lt;/name&gt;
+  &lt;value&gt;$PROXY_USER:*&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+  &lt;name&gt;instance.rpc.sasl.allowed.host.impersonation&lt;/name&gt;
+  &lt;value&gt;*&lt;/value&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Here, <code>$PROXY_USER</code> can impersonate any user from any host.</p>
+</div>
+<div class="paragraph">
+<p>The following is an example of specifying a subset of users <code>$PROXY_USER</code> can impersonate and also
+limiting the hosts from which <code>$PROXY_USER</code> can initiate requests from.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;instance.rpc.sasl.allowed.user.impersonation&lt;/name&gt;
+  &lt;value&gt;$PROXY_USER:user1,user2;$PROXY_USER2:user2,user4&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+  &lt;name&gt;instance.rpc.sasl.allowed.host.impersonation&lt;/name&gt;
+  &lt;value&gt;host1.domain.com,host2.domain.com;*&lt;/value&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Here, <code>$PROXY_USER</code> can impersonate user1 and user2 only from host1.domain.com or host2.domain.com.
+<code>$PROXY_USER2</code> can impersonate user2 and user4 from any host.</p>
+</div>
+<div class="paragraph">
+<p>In these examples, the value <code>$PROXY_USER</code> is the Kerberos principal of the server which is acting on behalf of a user.
+Impersonation is enforced by the Kerberos principal and the host from which the RPC originated (from the perspective
+of the Accumulo TabletServers/Masters). An asterisk (*) can be used to specify all users or all hosts (depending on the context).</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_delegation_tokens_2">Delegation Tokens</h5>
+<div class="paragraph">
+<p>Within Accumulo services, the primary task to implement delegation tokens is the generation and distribution
+of a shared secret among all Accumulo tabletservers and the master. The secret key allows for generation
+of delegation tokens for users and verification of delegation tokens presented by clients. If a server
+process is unaware of the secret key used to create a delegation token, the client cannot be authenticated.
+As ZooKeeper distribution is an asynchronous operation (typically on the order of seconds), the
+value for <code>general.delegation.token.update.interval</code> should be on the order of hours to days to reduce the
+likelihood of servers rejecting valid clients because the server did not yet see a new secret key.</p>
+</div>
+<div class="paragraph">
+<p>Supporting authentication with both Kerberos credentials and delegation tokens, the SASL thrift
+server accepts connections with either <code>GSSAPI</code> and <code>DIGEST-MD5</code> mechanisms set. The <code>DIGEST-MD5</code> mechanism
+enables authentication as a normal username and password exchange which `DelegationToken`s leverages.</p>
+</div>
+<div class="paragraph">
+<p>Since delegation tokens are a weaker form of authentication than Kerberos credentials, user access
+to obtain delegation tokens from Accumulo is protected with the <code>DELEGATION_TOKEN</code> system permission. Only
+users with the system permission are allowed to obtain delegation tokens. It is also recommended
+to configure confidentiality with SASL, using the <code>rpc.sasl.qop=auth-conf</code> configuration property, to
+ensure that prying eyes cannot view the <code>DelegationToken</code> as it passes over the network.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre># Check a user's permissions
+admin@REALM@accumulo&gt; userpermissions -u user@REALM
+
+# Grant the DELEGATION_TOKEN system permission to a user
+admin@REALM@accumulo&gt; grant System.DELEGATION_TOKEN -s -u user@REALM</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_clients">16.4.2. Clients</h4>
+<div class="sect4">
+<h5 id="_create_client_principal">Create client principal</h5>
+<div class="paragraph">
+<p>Like the Accumulo servers, clients must also have a Kerberos principal created for them. The
+primary difference between a server principal is that principals for users are created
+with a password and also not qualified to a specific instance (host).</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>kadmin.local -q "addprinc $user"</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The above will prompt for a password for that user which will be used to identify that $user.
+The user can verify that they can authenticate with the KDC using the command <code>kinit $user</code>.
+Upon entering the correct password, a local credentials cache will be made which can be used
+to authenticate with Accumulo, access HDFS, etc.</p>
+</div>
+<div class="paragraph">
+<p>The user can verify the state of their local credentials cache by using the command <code>klist</code>.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ klist
+Ticket cache: FILE:/tmp/krb5cc_123
+Default principal: user@EXAMPLE.COM
+
+Valid starting       Expires              Service principal
+01/07/2015 11:56:35  01/08/2015 11:56:35  krbtgt/EXAMPLE.COM@EXAMPLE.COM
+	renew until 01/14/2015 11:56:35</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_configuration_3">Configuration</h5>
+<div class="paragraph">
+<p>The second thing clients need to do is to set up their client configuration file. By
+default, this file is stored in <code>~/.accumulo/config</code>, <code>$ACCUMULO_CONF_DIR/client.conf</code> or
+<code>$ACCUMULO_HOME/conf/client.conf</code>. Accumulo utilities also allow you to provide your own
+copy of this file in any location using the <code>--config-file</code> command line option.</p>
+</div>
+<div class="paragraph">
+<p>Three items need to be set to enable access to Accumulo:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><code>instance.rpc.sasl.enabled</code>=<em>true</em></p>
+</li>
+<li>
+<p><code>rpc.sasl.qop</code>=<em>auth</em></p>
+</li>
+<li>
+<p><code>kerberos.server.primary</code>=<em>accumulo</em></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Each of these properties <strong>must</strong> match the configuration of the accumulo servers; this is
+required to set up the SASL transport.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_verifying_administrative_access">Verifying Administrative Access</h5>
+<div class="paragraph">
+<p>At this point you should have enough configured on the server and client side to interact with
+the system. You should verify that the administrative user you chose earlier can successfully
+interact with the sytem.</p>
+</div>
+<div class="paragraph">
+<p>While this example logs in via <code>kinit</code> with a password, any login method that caches Kerberos tickets
+should work.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ kinit accumulo_admin@EXAMPLE.COM
+Password for accumulo_admin@EXAMPLE.COM: ******************************
+$ accumulo shell
+
+Shell - Apache Accumulo Interactive Shell
+-
+- version: 1.7.2
+- instance name: MYACCUMULO
+- instance id: 483b9038-889f-4b2d-b72b-dfa2bb5dbd07
+-
+- type 'help' for a list of available commands
+-
+accumulo_admin@EXAMPLE.COM@MYACCUMULO&gt; userpermissions
+System permissions: System.GRANT, System.CREATE_TABLE, System.DROP_TABLE, System.ALTER_TABLE, System.CREATE_USER, System.DROP_USER, System.ALTER_USER, System.SYSTEM, System.CREATE_NAMESPACE, System.DROP_NAMESPACE, System.ALTER_NAMESPACE, System.OBTAIN_DELEGATION_TOKEN
+
+Namespace permissions (accumulo): Namespace.READ, Namespace.ALTER_TABLE
+
+Table permissions (accumulo.metadata): Table.READ, Table.ALTER_TABLE
+Table permissions (accumulo.replication): Table.READ
+Table permissions (accumulo.root): Table.READ, Table.ALTER_TABLE
+
+accumulo_admin@EXAMPLE.COM@MYACCUMULO&gt; quit
+$ kdestroy
+$</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_delegationtokens_with_mapreduce">DelegationTokens with MapReduce</h5>
+<div class="paragraph">
+<p>To use DelegationTokens in a custom MapReduce job, the call to <code>setConnectorInfo()</code> method
+on <code>AccumuloInputFormat</code> or <code>AccumuloOutputFormat</code> should be the only necessary change. Instead
+of providing an instance of a <code>KerberosToken</code>, the user must call <code>SecurityOperations.getDelegationToken</code>
+using a <code>Connector</code> obtained with that <code>KerberosToken</code>, and pass the <code>DelegationToken</code> to
+<code>setConnectorInfo</code> instead of the <code>KerberosToken</code>. It is expected that the user launching
+the MapReduce job is already logged in via Kerberos via a keytab or via a locally-cached
+Kerberos ticket-granting-ticket (TGT).</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">Instance instance = getInstance();
+KerberosToken kt = new KerberosToken();
+Connector conn = instance.getConnector(principal, kt);
+DelegationToken dt = conn.securityOperations().getDelegationToken();
+
+// Reading from Accumulo
+AccumuloInputFormat.setConnectorInfo(job, principal, dt);
+
+// Writing to Accumulo
+AccumuloOutputFormat.setConnectorInfo(job, principal, dt);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If the user passes a <code>KerberosToken</code> to the <code>setConnectorInfo</code> method, the implementation will
+attempt to obtain a <code>DelegationToken</code> automatically, but this does have limitations
+based on the other MapReduce configuration methods already called and permissions granted
+to the calling user. It is best for the user to acquire the DelegationToken on their own
+and provide it directly to <code>setConnectorInfo</code>.</p>
+</div>
+<div class="paragraph">
+<p>Users must have the <code>DELEGATION_TOKEN</code> system permission to call the <code>getDelegationToken</code>
+method. The obtained delegation token is only valid for the requesting user for a period
+of time dependent on Accumulo&#8217;s configuration (<code>general.delegation.token.lifetime</code>).</p>
+</div>
+<div class="paragraph">
+<p>It is also possible to obtain and use `DelegationToken`s outside of the context
+of MapReduce.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">String principal = "user@REALM";
+Instance instance = getInstance();
+Connector connector = instance.getConnector(principal, new KerberosToken());
+DelegationToken delegationToken = connector.securityOperations().getDelegationToken();
+
+Connector dtConnector = instance.getConnector(principal, delegationToken);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Use of the <code>dtConnector</code> will perform each operation as the original user, but without
+their Kerberos credentials.</p>
+</div>
+<div class="paragraph">
+<p>For the duration of validity of the <code>DelegationToken</code>, the user <strong>must</strong> take the necessary precautions
+to protect the <code>DelegationToken</code> from prying eyes as it can be used by any user on any host to impersonate
+the user who requested the <code>DelegationToken</code>. YARN ensures that passing the delegation token from the client
+JVM to each YARN task is secure, even in multi-tenant instances.</p>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_debugging">16.4.3. Debugging</h4>
+<div class="paragraph">
+<p><strong>Q</strong>: I have valid Kerberos credentials and a correct client configuration file but
+I still get errors like:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: When you have a valid client configuration and Kerberos TGT, it is possible that the search
+path for your local credentials cache is incorrect. Check the value of the KRB5CCNAME environment
+value, and ensure it matches the value reported by <code>klist</code>.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ echo $KRB5CCNAME
+
+$ klist
+Ticket cache: FILE:/tmp/krb5cc_123
+Default principal: user@EXAMPLE.COM
+
+Valid starting       Expires              Service principal
+01/07/2015 11:56:35  01/08/2015 11:56:35  krbtgt/EXAMPLE.COM@EXAMPLE.COM
+	renew until 01/14/2015 11:56:35
+$ export KRB5CCNAME=/tmp/krb5cc_123
+$ echo $KRB5CCNAME
+/tmp/krb5cc_123</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: I thought I had everything configured correctly, but my client/server still fails to log in.
+I don&#8217;t know what is actually failing.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Add the following system property to the JVM invocation:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>-Dsun.security.krb5.debug=true</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This will enable lots of extra debugging at the JVM level which is often sufficient to
+diagnose some high-level configuration problem. Client applications can add this system property by
+hand to the command line and Accumulo server processes or applications started using the <code>accumulo</code>
+script by adding the property to <code>ACCUMULO_GENERAL_OPTS</code> in <code>$ACCUMULO_CONF_DIR/accumulo-env.sh</code>.</p>
+</div>
+<div class="paragraph">
+<p>Additionally, you can increase the log4j levels on <code>org.apache.hadoop.security</code>, which includes the
+Hadoop <code>UserGroupInformation</code> class, which will include some high-level debug statements. This
+can be controlled in your client application, or using <code>$ACCUMULO_CONF_DIR/generic_logger.xml</code></p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: All of my Accumulo processes successfully start and log in with their
+keytab, but they are unable to communicate with each other, showing the
+following errors:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>2015-01-12 14:47:27,055 [transport.TSaslTransport] ERROR: SASL negotiation failure
+javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]
+        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
+        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
+        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
+        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
+        at org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:53)
+        at org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:49)
+        at java.security.AccessController.doPrivileged(Native Method)
+        at javax.security.auth.Subject.doAs(Subject.java:415)
+        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
+        at org.apache.accumulo.core.rpc.UGIAssumingTransport.open(UGIAssumingTransport.java:49)
+        at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:357)
+        at org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:255)
+        at org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.getTableMap(LiveTServerSet.java:106)
+        at org.apache.accumulo.master.Master.gatherTableInformation(Master.java:996)
+        at org.apache.accumulo.master.Master.access$600(Master.java:160)
+        at org.apache.accumulo.master.Master$StatusThread.updateStatus(Master.java:911)
+        at org.apache.accumulo.master.Master$StatusThread.run(Master.java:901)
+Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
+        at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710)
+        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
+        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
+        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
+        ... 16 more
+Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
+        at sun.security.krb5.KrbTgsRep.&lt;init&gt;(KrbTgsRep.java:73)
+        at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:192)
+        at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:203)
+        at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:309)
+        at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:115)
+        at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:454)
+        at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641)
+        ... 19 more
+Caused by: KrbException: Identifier doesn't match expected value (906)
+        at sun.security.krb5.internal.KDCRep.init(KDCRep.java:143)
+        at sun.security.krb5.internal.TGSRep.init(TGSRep.java:66)
+        at sun.security.krb5.internal.TGSRep.&lt;init&gt;(TGSRep.java:61)
+        at sun.security.krb5.KrbTgsRep.&lt;init&gt;(KrbTgsRep.java:55)
+        ... 25 more</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>or</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>2015-01-12 14:47:29,440 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
+java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
+        at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
+        at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
+        at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
+        at java.security.AccessController.doPrivileged(Native Method)
+        at javax.security.auth.Subject.doAs(Subject.java:356)
+        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
+        at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
+        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
+        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
+        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
+        at java.lang.Thread.run(Thread.java:745)
+Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
+        at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
+        at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
+        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
+        at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
+        at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
+        ... 10 more</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: As previously mentioned, the hostname, and subsequently the address each Accumulo process is bound/listening
+on, is extremely important when negotiating an SASL connection. This problem commonly arises when the Accumulo
+servers are not configured to listen on the address denoted by their FQDN.</p>
+</div>
+<div class="paragraph">
+<p>The values in the Accumulo "hosts" files (In <code>$ACCUMULO_CONF_DIR</code>: <code>masters</code>, <code>monitors</code>, <code>slaves</code>, <code>tracers</code>,
+and <code>gc</code>) should match the instance componentof the Kerberos server principal (e.g. <code>host</code> in <code>accumulo/host@EXAMPLE.COM</code>).</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: After configuring my system for Kerberos, server processes come up normally and I can interact with the system. However,
+when I attempt to use the "Recent Traces" page on the Monitor UI I get a stacktrace similar to:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>                                                                         java.lang.AssertionError: AuthenticationToken should not be null
+                                                                   at org.apache.accumulo.monitor.servlets.trace.Basic.getScanner(Basic.java:139)
+                                                                  at org.apache.accumulo.monitor.servlets.trace.Summary.pageBody(Summary.java:164)
+                                                                  at org.apache.accumulo.monitor.servlets.BasicServlet.doGet(BasicServlet.java:63)
+                                                                           at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
+                                                                           at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
+                                                                      at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:738)
+                                                                    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
+                                                                  at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
+                                                                   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:568)
+                                                                at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
+                                                                at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
+                                                                    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
+                                                                 at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
+                                                                at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
+                                                                  at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
+                                                                  at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
+                                                                             at org.eclipse.jetty.server.Server.handle(Server.java:462)
+                                                                        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
+                                                                   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
+                                                                    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
+                                                                 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
+                                                                 at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
+                                                                                      at java.lang.Thread.run(Thread.java:745)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: This indicates that the Monitor has not been able to successfully log in a client-side user to read from the <code>trace</code> table. Accumulo allows the TraceServer to rely on the property <code>general.kerberos.keytab</code> as a fallback when logging in the trace user if the <code>trace.token.property.keytab</code> property isn&#8217;t defined. Some earlier versions of Accumulo did not do this same fallback for the Monitor&#8217;s use of the trace user. The end [...]
+</div>
+<div class="paragraph">
+<p>Ensure you have set <code>trace.token.property.keytab</code> to point to a keytab for the principal defined in <code>trace.user</code> in the <code>accumulo-site.xml</code> file for the Monitor, since that should work in all versions of Accumulo.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_sampling">17. Sampling</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_overview_4">17.1. Overview</h3>
+<div class="paragraph">
+<p>Accumulo has the ability to generate and scan a per table set of sample data.
+This sample data is kept up to date as a table is mutated.  What key values are
+placed in the sample data is configurable per table.</p>
+</div>
+<div class="paragraph">
+<p>This feature can be used for query estimation and optimization.  For an example
+of estimaiton assume an Accumulo table is configured to generate a sample
+containing one millionth of a tables data.   If a query is executed against the
+sample and returns one thousand results, then the same query against all the
+data would probably return a billion results.  A nice property of having
+Accumulo generate the sample is that its always up to date.  So estimations
+will be accurate even when querying the most recently written data.</p>
+</div>
+<div class="paragraph">
+<p>An example of a query optimization is an iterator using sample data to get an
+estimate, and then making decisions based on the estimate.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_configuring">17.2. Configuring</h3>
+<div class="paragraph">
+<p>Inorder to use sampling, an Accumulo table must be configured with a class that
+implements <code>org.apache.accumulo.core.sample.Sampler</code> along with options for
+that class.  For guidance on implementing a Sampler see that interface&#8217;s
+javadoc.  Accumulo provides a few implementations out of the box.   For
+information on how to use the samplers that ship with Accumulo look in the
+package <code>org.apache.accumulo.core.sample</code> and consult the javadoc of the
+classes there.  See <code>README.sample</code> and <code>SampleExample.java</code> for examples of
+how to configure a Sampler on a table.</p>
+</div>
+<div class="paragraph">
+<p>Once a table is configured with a sampler all writes after that point will
+generate sample data.  For data written before sampling was configured sample
+data will not be present.  A compaction can be initiated that only compacts the
+files in the table that do not have sample data.   The example readme shows how
+to do this.</p>
+</div>
+<div class="paragraph">
+<p>If the sampling configuration of a table is changed, then Accumulo will start
+generating new sample data with the new configuration.   However old data will
+still have sample data generated with the previous configuration.  A selective
+compaction can also be issued in this case to regenerate the sample data.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_scanning_sample_data">17.3. Scanning sample data</h3>
+<div class="paragraph">
+<p>Inorder to scan sample data, use the <code>setSamplerConfiguration(&#8230;&#8203;)</code>  method on
+<code>Scanner</code> or <code>BatchScanner</code>.  Please consult this methods javadocs for more
+information.</p>
+</div>
+<div class="paragraph">
+<p>Sample data can also be scanned from within an Accumulo
+<code>SortedKeyValueIterator</code>.  To see how to do this look at the example iterator
+referenced in README.sample.  Also, consult the javadoc on
+<code>org.apache.accumulo.core.iterators.IteratorEnvironment.cloneWithSamplingEnabled()</code>.</p>
+</div>
+<div class="paragraph">
+<p>Map reduce jobs using the <code>AccumuloInputFormat</code> can also read sample data.  See
+the javadoc for the <code>setSamplerConfiguration()</code> method on
+<code>AccumuloInputFormat</code>.</p>
+</div>
+<div class="paragraph">
+<p>Scans over sample data will throw a <code>SampleNotPresentException</code> in the following cases :</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>sample data is not present,</p>
+</li>
+<li>
+<p>sample data is present but was generated with multiple configurations</p>
+</li>
+<li>
+<p>sample data is partially present</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>So a scan over sample data can only succeed if all data written has sample data
+generated with the same configuration.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_bulk_import">17.4. Bulk import</h3>
+<div class="paragraph">
+<p>When generating rfiles to bulk import into Accumulo, those rfiles can contain
+sample data.  To use this feature, look at the javadoc on the
+<code>AccumuloFileOutputFormat.setSampler(&#8230;&#8203;)</code> method.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_administration_2">18. Administration</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_hardware">18.1. Hardware</h3>
+<div class="paragraph">
+<p>Because we are running essentially two or three systems simultaneously layered
+across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
+consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have
+at least one core and 2 - 4 GB each.</p>
+</div>
+<div class="paragraph">
+<p>One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
+typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks.</p>
+</div>
+<div class="paragraph">
+<p>It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB
+each, but in this case it is recommended to only run up to two processes per
+machine&#8201;&#8212;&#8201;i.e. DataNode and TabletServer or DataNode and MapReduce worker but
+not all three. The constraint here is having enough available heap space for all the
+processes on a machine.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_network">18.2. Network</h3>
+<div class="paragraph">
+<p>Accumulo communicates via remote procedure calls over TCP/IP for both passing
+data and control messages. In addition, Accumulo uses HDFS clients to
+communicate with HDFS. To achieve good ingest and query performance, sufficient
+network bandwidth must be available between any two machines.</p>
+</div>
+<div class="paragraph">
+<p>In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will
+use the following default ports. Please make sure that they are open, or change
+their value in conf/accumulo-site.xml.</p>
+</div>
+<table class="tableblock frame-all grid-all" style="width: 75%;">
+<caption class="title">Table 1. Accumulo default ports</caption>
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 40%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-right valign-top">Port</th>
+<th class="tableblock halign-center valign-top">Description</th>
+<th class="tableblock halign-center valign-top">Property Name</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">4445</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Shutdown Port (Accumulo MiniCluster)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">n/a</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">4560</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo monitor (for centralized log display)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">monitor.port.log4j</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">9995</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo HTTP monitor</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">monitor.port.client</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">9997</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Tablet Server</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">tserver.port.client</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">9998</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo GC</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">gc.port.client</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">9999</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Master Server</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">master.port.client</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">12234</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo Tracer</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">trace.port.client</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">42424</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo Proxy Server</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">n/a</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">10001</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Master Replication service</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">master.replication.coordinator.port</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-right valign-top"><p class="tableblock">10002</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">TabletServer Replication service</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">replication.receipt.service.port</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>In addition, the user can provide <code>0</code> and an ephemeral port will be chosen instead. This
+ephemeral port is likely to be unique and not already bound. Thus, configuring ports to
+use <code>0</code> instead of an explicit value, should, in most cases, work around any issues of
+running multiple distinct Accumulo instances (or any other process which tries to use the
+same default ports) on the same hardware. Finally, the *.port.client properties will work
+with the port range syntax (M-N) allowing the user to specify a range of ports for the
+service to attempt to bind. The ports in the range will be tried in a 1-up manner starting
+at the low end of the range to, and including, the high end of the range.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_installation">18.3. Installation</h3>
+<div class="paragraph">
+<p>Download a binary distribution of Accumulo and install it to a directory on a disk with
+sufficient space:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>cd &lt;install directory&gt;
+tar xzf accumulo-X.Y.Z-bin.tar.gz   # Replace 'X.Y.Z' with your Accumulo version
+cd accumulo-X.Y.Z</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Repeat this step on each machine in your cluster. Typically, the same <code>&lt;install directory&gt;</code>
+is chosen for all machines in the cluster. When you configure Accumulo, the <code>$ACCUMULO_HOME</code>
+environment variable should be set to <code>/path/to/&lt;install directory&gt;/accumulo-X.Y.Z</code>.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_dependencies">18.4. Dependencies</h3>
+<div class="paragraph">
+<p>Accumulo requires HDFS and ZooKeeper to be configured and running
+before starting. Password-less SSH should be configured between at least the
+Accumulo master and TabletServer machines. It is also a good idea to run Network
+Time Protocol (NTP) within the cluster to ensure nodes' clocks don&#8217;t get too out of
+sync, which can cause problems with automatically timestamped data.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_configuration_4">18.5. Configuration</h3>
+<div class="paragraph">
+<p>Accumulo is configured by editing several Shell and XML files found in
+<code>$ACCUMULO_HOME/conf</code>. The structure closely resembles Hadoop&#8217;s configuration
+files.</p>
+</div>
+<div class="paragraph">
+<p>Logging is primarily controlled using the log4j configuration files,
+<code>generic_logger.xml</code> and <code>monitor_logger.xml</code> (or their corresponding
+<code>.properties</code> version if the <code>.xml</code> version is missing). The generic logger is
+used for most server types, and is typically configured to send logs to the
+monitor, as well as log files. The monitor logger is used by the monitor, and
+is typically configured to log only errors the monitor itself generates,
+rather than all the logs that it receives from other server types.</p>
+</div>
+<div class="sect3">
+<h4 id="_edit_conf_accumulo_env_sh">18.5.1. Edit conf/accumulo-env.sh</h4>
+<div class="paragraph">
+<p>Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh
+and specify the following:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Enter the location of the installation directory of Accumulo for <code>$ACCUMULO_HOME</code></p>
+</li>
+<li>
+<p>Enter your system&#8217;s Java home for <code>$JAVA_HOME</code></p>
+</li>
+<li>
+<p>Enter the location of Hadoop for <code>$HADOOP_PREFIX</code></p>
+</li>
+<li>
+<p>Choose a location for Accumulo logs and enter it for <code>$ACCUMULO_LOG_DIR</code></p>
+</li>
+<li>
+<p>Enter the location of ZooKeeper for <code>$ZOOKEEPER_HOME</code></p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>By default Accumulo TabletServers are set to use 1GB of memory. You may change
+this by altering the value of <code>$ACCUMULO_TSERVER_OPTS</code>. Note the syntax is that of
+the Java JVM command line options. This value should be less than the physical
+memory of the machines running TabletServers.</p>
+</div>
+<div class="paragraph">
+<p>There are similar options for the master&#8217;s memory usage and the garbage collector
+process. Reduce these if they exceed the physical RAM of your hardware and
+increase them, within the bounds of the physical RAM, if a process fails because of
+insufficient memory.</p>
+</div>
+<div class="paragraph">
+<p>Note that you will be specifying the Java heap space in accumulo-env.sh. You should
+make sure that the total heap space used for the Accumulo tserver and the Hadoop
+DataNode and TaskTracker is less than the available memory on each slave node in
+the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop
+NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
+machines to allow them to use more heap space. If you are running these on the
+same machine on a small cluster, likewise make sure their heap space settings fit
+within the available memory.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_native_map">18.5.2. Native Map</h4>
+<div class="paragraph">
+<p>The tablet server uses a data structure called a MemTable to store sorted key/value
+pairs in memory when they are first received from the client. When a minor compaction
+occurs, this data structure is written to HDFS. The MemTable will default to using
+memory in the JVM but a JNI version, called the native map, can be used to significantly
+speed up performance by utilizing the memory space of the native operating system. The
+native map also avoids the performance implications brought on by garbage collection
+in the JVM by causing it to pause much less frequently.</p>
+</div>
+<div class="sect4">
+<h5 id="_building">Building</h5>
+<div class="paragraph">
+<p>32-bit and 64-bit Linux and Mac OS X versions of the native map can be built
+from the Accumulo bin package by executing
+<code>$ACCUMULO_HOME/bin/build_native_library.sh</code>. If your system&#8217;s
+default compiler options are insufficient, you can add additional compiler
+options to the command line, such as options for the architecture. These will be
+passed to the Makefile in the environment variable <code>USERFLAGS</code>.</p>
+</div>
+<div class="paragraph">
+<p>Examples:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p><code>$ACCUMULO_HOME/bin/build_native_library.sh</code></p>
+</li>
+<li>
+<p><code>$ACCUMULO_HOME/bin/build_native_library.sh -m32</code></p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>After building the native map from the source, you will find the artifact in
+<code>$ACCUMULO_HOME/lib/native</code>. Upon starting up, the tablet server will look
+in this directory for the map library. If the file is renamed or moved from its
+target directory, the tablet server may not be able to find it. The system can
+also locate the native maps shared library by setting <code>LD_LIBRARY_PATH</code>
+(or <code>DYLD_LIBRARY_PATH</code> on Mac OS X) in <code>$ACCUMULO_HOME/conf/accumulo-env.sh</code>.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_native_maps_configuration">Native Maps Configuration</h5>
+<div class="paragraph">
+<p>As mentioned, Accumulo will use the native libraries if they are found in the expected
+location and <code>tserver.memory.maps.native.enabled</code> is set to <code>true</code> (which is the default).
+Using the native maps over JVM Maps nets a noticable improvement in ingest rates; however,
+certain configuration variables are important to modify when increasing the size of the
+native map.</p>
+</div>
+<div class="paragraph">
+<p>To adjust the size of the native map, increase the value of <code>tserver.memory.maps.max</code>.
+By default, the maximum size of the native map is 1GB. When increasing this value, it is
+also important to adjust the values of <code>table.compaction.minor.logs.threshold</code> and
+<code>tserver.walog.max.size</code>. <code>table.compaction.minor.logs.threshold</code> is the maximum
+number of write-ahead log files that a tablet can reference before they will be automatically
+minor compacted. <code>tserver.walog.max.size</code> is the maximum size of a write-ahead log.</p>
+</div>
+<div class="paragraph">
+<p>The maximum size of the native maps for a server should be less than the product
+of the write-ahead log maximum size and minor compaction threshold for log files:</p>
+</div>
+<div class="paragraph">
+<p><code>$table.compaction.minor.logs.threshold * $tserver.walog.max.size &gt;= $tserver.memory.maps.max</code></p>
+</div>
+<div class="paragraph">
+<p>This formula ensures that minor compactions won&#8217;t be automatically triggered before the native
+maps can be completely saturated.</p>
+</div>
+<div class="paragraph">
+<p>Subsequently, when increasing the size of the write-ahead logs, it can also be important
+to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log.
+This is controlled via <code>tserver.wal.blocksize</code>. A basic recommendation is that when
+<code>tserver.walog.max.size</code> is larger than 2GB in size, set <code>tserver.wal.blocksize</code> to 2GB.
+Increasing the block size to a value larger than 2GB can result in decreased write
+performance to the write-ahead log file which will slow ingest.</p>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_cluster_specification">18.5.3. Cluster Specification</h4>
+<div class="paragraph">
+<p>On the machine that will serve as the Accumulo master:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Write the IP address or domain name of the Accumulo Master to the <code>$ACCUMULO_HOME/conf/masters</code> file.</p>
+</li>
+<li>
+<p>Write the IP addresses or domain name of the machines that will be TabletServers in <code>$ACCUMULO_HOME/conf/slaves</code>, one per line.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>Note that if using domain names rather than IP addresses, DNS must be configured
+properly for all machines participating in the cluster. DNS can be a confusing source
+of errors.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_accumulo_settings">18.5.4. Accumulo Settings</h4>
+<div class="paragraph">
+<p>Specify appropriate values for the following settings in
+<code>$ACCUMULO_HOME/conf/accumulo-site.xml</code> :</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+    &lt;name&gt;instance.zookeeper.host&lt;/name&gt;
+    &lt;value&gt;zooserver-one:2181,zooserver-two:2181&lt;/value&gt;
+    &lt;description&gt;list of zookeeper servers&lt;/description&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate
+settings between processes and helps finalize TabletServer failure.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+    &lt;name&gt;instance.secret&lt;/name&gt;
+    &lt;value&gt;DEFAULT&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The instance needs a secret to enable secure communication between servers. Configure your
+secret and make sure that the <code>accumulo-site.xml</code> file is not readable to other users.
+For alternatives to storing the <code>instance.secret</code> in plaintext, please read the
+<code>Sensitive Configuration Values</code> section.</p>
+</div>
+<div class="paragraph">
+<p>Some settings can be modified via the Accumulo shell and take effect immediately, but
+some settings require a process restart to take effect. See the configuration documentation
+(available in the docs directory of the tarball and in <a href="#configuration">Configuration Management</a>) for details.</p>
+</div>
+<div class="paragraph">
+<p>One aspect of Accumulo&#8217;s configuration which is different as compared to the rest of the Hadoop
+ecosystem is that the server-process classpath is determined in part by multiple values. A
+bootstrap classpath is based soley on the <code>accumulo-start.jar</code>, Log4j and <code>$ACCUMULO_CONF_DIR</code>.</p>
+</div>
+<div class="paragraph">
+<p>A second classloader is used to dynamically load all of the resources specified by <code>general.classpaths</code>
+in <code>$ACCUMULO_CONF_DIR/accumulo-site.xml</code>. This value is a comma-separated list of regular-expression
+paths which are all loaded into a secondary classloader. This includes Hadoop, Accumulo and ZooKeeper
+jars necessary to run Accumulo. When this value is not defined, a default value is used which attempts
+to load Hadoop from multiple potential locations depending on how Hadoop was installed. It is strongly
+recommended that <code>general.classpaths</code> is defined and limited to only the necessary jars to prevent
+extra jars from being unintentionally loaded into Accumulo processes.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_hostnames_in_configuration_files">18.5.5. Hostnames in configuration files</h4>
+<div class="paragraph">
+<p>Accumulo has a number of configuration files which can contain references to other hosts in your
+network. All of the "host" configuration files for Accumulo (<code>gc</code>, <code>masters</code>, <code>slaves</code>, <code>monitor</code>,
+<code>tracers</code>) as well as <code>instance.volumes</code> in accumulo-site.xml must contain some host reference.</p>
+</div>
+<div class="paragraph">
+<p>While IP address, short hostnames, or fully qualified domain names (FQDN) are all technically valid, it
+is good practice to always use FQDNs for both Accumulo and other processes in your Hadoop cluster.
+Failing to consistently use FQDNs can have unexpected consequences in how Accumulo uses the FileSystem.</p>
+</div>
+<div class="paragraph">
+<p>A common way for this problem can be observed is via applications that use Bulk Ingest. The Accumulo
+Master coordinates moving the input files to Bulk Ingest to an Accumulo-managed directory. However,
+Accumulo cannot safely move files across different Hadoop FileSystems. This is problematic because
+Accumulo also cannot make reliable assertions across what is the same FileSystem which is specified
+with different names. Naively, while 127.0.0.1:8020 might be a valid identifier for an HDFS instance,
+Accumulo identifies <code>localhost:8020</code> as a different HDFS instance than <code>127.0.0.1:8020</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_deploy_configuration">18.5.6. Deploy Configuration</h4>
+<div class="paragraph">
+<p>Copy the masters, slaves, accumulo-env.sh, and if necessary, accumulo-site.xml
+from the <code>$ACCUMULO_HOME/conf/</code> directory on the master to all the machines
+specified in the slaves file.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_sensitive_configuration_values">18.5.7. Sensitive Configuration Values</h4>
+<div class="paragraph">
+<p>Accumulo has a number of properties that can be specified via the accumulo-site.xml
+file which are sensitive in nature, instance.secret and trace.token.property.password
+are two common examples. Both of these properties, if compromised, have the ability
+to result in data being leaked to users who should not have access to that data.</p>
+</div>
+<div class="paragraph">
+<p>In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as a common
+implementation to abstract away the storage and retrieval of passwords from plaintext
+storage in configuration files. Any Property marked with the <code>Sensitive</code> annotation
+is a candidate for use with these CredentialProviders. For version of Hadoop which lack
+these classes, the feature will just be unavailable for use.</p>
+</div>
+<div class="paragraph">
+<p>A comma separated list of CredentialProviders can be configured using the Accumulo Property
+<code>general.security.credential.provider.paths</code>. Each configured URL will be consulted
+when the Configuration object for accumulo-site.xml is accessed.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_using_a_javakeystorecredentialprovider_for_storage">18.5.8. Using a JavaKeyStoreCredentialProvider for storage</h4>
+<div class="paragraph">
+<p>One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore CredentialProvider.
+Each entry in the KeyStore is the Accumulo Property key name. For example, to store the
+<code>instance.secret</code>, the following command can be used:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>hadoop credential create instance.secret --provider jceks://file/etc/accumulo/conf/accumulo.jceks</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The command will then prompt you to enter the secret to use and create a keystore in:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>/etc/accumulo/conf/accumulo.jceks</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Then, accumulo-site.xml must be configured to use this KeyStore as a CredentialProvider:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+    &lt;name&gt;general.security.credential.provider.paths&lt;/name&gt;
+    &lt;value&gt;jceks://file/etc/accumulo/conf/accumulo.jceks&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This configuration will then transparently extract the <code>instance.secret</code> from
+the configured KeyStore and alleviates a human readable storage of the sensitive
+property.</p>
+</div>
+<div class="paragraph">
+<p>A KeyStore can also be stored in HDFS, which will make the KeyStore readily available to
+all Accumulo servers. If the local filesystem is used, be aware that each Accumulo server
+will expect the KeyStore in the same location.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="ClientConfiguration">18.5.9. Client Configuration</h4>
+<div class="paragraph">
+<p>In version 1.6.0, Accumulo included a new type of configuration file known as a client
+configuration file. One problem with the traditional "site.xml" file that is prevalent
+through Hadoop is that it is a single file used by both clients and servers. This makes
+it very difficult to protect secrets that are only meant for the server processes while
+allowing the clients to connect to the servers.</p>
+</div>
+<div class="paragraph">
+<p>The client configuration file is a subset of the information stored in accumulo-site.xml
+meant only for consumption by clients of Accumulo. By default, Accumulo checks a number
+of locations for a client configuration by default:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><code>${ACCUMULO_CONF_DIR}/client.conf</code></p>
+</li>
+<li>
+<p><code>/etc/accumulo/client.conf</code></p>
+</li>
+<li>
+<p><code>/etc/accumulo/conf/client.conf</code></p>
+</li>
+<li>
+<p><code>~/.accumulo/config</code></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>These files are <a href="https://en.wikipedia.org/wiki/.properties">Java Properties files</a>. These files
+can currently contain information about ZooKeeper servers, RPC properties (such as SSL or SASL
+connectors), distributed tracing properties. Valid properties are defined by the
+<a href="https://github.com/apache/accumulo/blob/f1d0ec93d9f13ff84844b5ac81e4a7b383ced467/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L54">ClientProperty</a>
+enum contained in the client API.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_custom_table_tags">18.5.10. Custom Table Tags</h4>
+<div class="paragraph">
+<p>Accumulo has the ability for users to add custom tags to tables.  This allows
+applications to set application-level metadata about a table.  These tags can be
+anything from a table description, administrator notes, date created, etc.
+This is done by naming and setting a property with a prefix <code>table.custom.*</code>.</p>
+</div>
+<div class="paragraph">
+<p>Currently, table properties are stored in ZooKeeper. This means that the number
+and size of custom properties should be restricted on the order of 10&#8217;s of properties
+at most without any properties exceeding 1MB in size. ZooKeeper&#8217;s performance can be
+very sensitive to an excessive number of nodes and the sizes of the nodes. Applications
+which leverage the user of custom properties should take these warnings into
+consideration. There is no enforcement of these warnings via the API.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_configuring_the_classloader">18.5.11. Configuring the ClassLoader</h4>
+<div class="paragraph">
+<p>Accumulo loads classes from the locations specified in the <code>general.classpaths</code> property. Additionally, Accumulo will load classes
+from the locations specified in the <code>general.dynamic.classpaths</code> property and will monitor and reload them if they change. The reloading
+feature is useful during the development and testing of iterators as new or modified iterator classes can be deployed to Accumulo without
+having to restart the database.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo also has an alternate configuration for the classloader which will allow it to load classes from remote locations. This mechanism
+uses Apache Commons VFS which enables locations such as http and hdfs to be used. This alternate configuration also uses the
+<code>general.classpaths</code> property in the same manner described above. It differs in that you need to configure the
+<code>general.vfs.classpaths</code> property instead of the <code>general.dynamic.classpath</code> property. As in the default configuration, this alternate
+configuration will also monitor the vfs locations for changes and reload if necessary.</p>
+</div>
+<div class="sect4">
+<h5 id="_classloader_contexts">ClassLoader Contexts</h5>
+<div class="paragraph">
+<p>With the addition of the VFS based classloader, we introduced the notion of classloader contexts. A context is identified
+by a name and references a set of locations from which to load classes and can be specified in the accumulo-site.xml file or added
+using the <code>config</code> command in the shell. Below is an example for specify the app1 context in the accumulo-site.xml file:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+  &lt;name&gt;general.vfs.context.classpath.app1&lt;/name&gt;
+  &lt;value&gt;hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar&lt;/value&gt;
+  &lt;description&gt;Application A classpath, loads jars from HDFS and local file system&lt;/description&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The default behavior follows the Java ClassLoader contract in that classes, if they exists, are loaded from the parent classloader first.
+You can override this behavior by delegating to the parent classloader after looking in this classloader first. An example of this
+configuration is:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+  &lt;name&gt;general.vfs.context.classpath.app1.delegation=post&lt;/name&gt;
+  &lt;value&gt;hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar&lt;/value&gt;
+  &lt;description&gt;Application A classpath, loads jars from HDFS and local file system&lt;/description&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To use contexts in your application you can set the <code>table.classpath.context</code> on your tables or use the <code>setClassLoaderContext()</code> method on Scanner
+and BatchScanner passing in the name of the context, app1 in the example above. Setting the property on the table allows your minc, majc, and scan
+iterators to load classes from the locations defined by the context. Passing the context name to the scanners allows you to override the table setting
+to load only scan time iterators from a different location.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_initialization">18.6. Initialization</h3>
+<div class="paragraph">
+<p>Accumulo must be initialized to create the structures it uses internally to locate
+data across the cluster. HDFS is required to be configured and running before
+Accumulo can be initialized.</p>
+</div>
+<div class="paragraph">
+<p>Once HDFS is started, initialization can be performed by executing
+<code>$ACCUMULO_HOME/bin/accumulo init</code> . This script will prompt for a name
+for this instance of Accumulo. The instance name is used to identify a set of tables
+and instance-specific settings. The script will then write some information into
+HDFS so Accumulo can start properly.</p>
+</div>
+<div class="paragraph">
+<p>The initialization script will prompt you to set a root password. Once Accumulo is
+initialized it can be started.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_running">18.7. Running</h3>
+<div class="sect3">
+<h4 id="_starting_accumulo">18.7.1. Starting Accumulo</h4>
+<div class="paragraph">
+<p>Make sure Hadoop is configured on all of the machines in the cluster, including
+access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
+Make sure ZooKeeper is configured and running on at least one machine in the
+cluster.
+Start Accumulo using the <code>bin/start-all.sh</code> script.</p>
+</div>
+<div class="paragraph">
+<p>To verify that Accumulo is running, check the Status page as described in
+<a href="#monitoring">Monitoring</a>. In addition, the Shell can provide some information about the status of
+tables via reading the metadata tables.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_stopping_accumulo">18.7.2. Stopping Accumulo</h4>
+<div class="paragraph">
+<p>To shutdown cleanly, run <code>bin/stop-all.sh</code> and the master will orchestrate the
+shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may
+take some time for particular configurations.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_adding_a_node">18.7.3. Adding a Node</h4>
+<div class="paragraph">
+<p>Update your <code>$ACCUMULO_HOME/conf/slaves</code> (or <code>$ACCUMULO_CONF_DIR/slaves</code>) file to account for the addition.</p>
+</div>
+<div class="paragraph">
+<p>Next, ssh to each of the hosts you want to add and run:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/bin/start-here.sh</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Make sure the host in question has the new configuration, or else the tablet
+server won&#8217;t start; at a minimum this needs to be on the host(s) being added,
+but in practice it&#8217;s good to ensure consistent configuration across all nodes.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_decomissioning_a_node">18.7.4. Decomissioning a Node</h4>
+<div class="paragraph">
+<p>If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet
+server. Accumulo will automatically rebalance the tablets across the available tablet servers.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/bin/accumulo admin stop &lt;host(s)&gt; {&lt;host&gt; ...}</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Alternatively, you can ssh to each of the hosts you want to remove and run:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ACCUMULO_HOME/bin/stop-here.sh</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Be sure to update your <code>$ACCUMULO_HOME/conf/slaves</code> (or <code>$ACCUMULO_CONF_DIR/slaves</code>) file to
+account for the removal of these hosts. Bear in mind that the monitor will not re-read the
+slaves file automatically, so it will report the decomissioned servers as down; it&#8217;s
+recommended that you restart the monitor so that the node list is up to date.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_restarting_process_on_a_node">18.7.5. Restarting process on a node</h4>
+<div class="paragraph">
+<p>Occasionally, it might be necessary to restart the processes on a specific node. In addition
+to the <code>start-all.sh</code> and <code>stop-all.sh</code> scripts, Accumulo contains scripts to start/stop all processes
+on a node and start/stop a given process on a node.</p>
+</div>
+<div class="paragraph">
+<p><code>start-here.sh</code> and <code>stop-here.sh</code> will start/stop all Accumulo processes on the current node. The
+necessary processes to start/stop are determined via the "hosts" files (e.g. slaves, masters, etc).
+These scripts expect no arguments.</p>
+</div>
+<div class="paragraph">
+<p><code>start-server.sh</code> can also be useful in starting a given process on a host.
+The first argument to the process is the hostname of the machine. Use the same host that
+you specified in hosts file (if you specified FQDN in the masters file, use the FQDN, not
+the shortname). The second argument is the name of the process to start (e.g. master, tserver).</p>
+</div>
+<div class="paragraph">
+<p>The steps described to decomission a node can also be used (without removal of the host
+from the <code>$ACCUMULO_HOME/conf/slaves</code> file) to gracefully stop a node. This will
+ensure that the tabletserver is cleanly stopped and recovery will not need to be performed
+when the tablets are re-hosted.</p>
+</div>
+<div class="sect4">
+<h5 id="_a_note_on_rolling_restarts">A note on rolling restarts</h5>
+<div class="paragraph">
+<p>For sufficiently large Accumulo clusters, restarting multiple TabletServers within a short window can place significant
+load on the Master server.  If slightly lower availability is acceptable, this load can be reduced by globally setting
+<code>table.suspend.duration</code> to a positive value.</p>
+</div>
+<div class="paragraph">
+<p>With <code>table.suspend.duration</code> set to, say, <code>5m</code>, Accumulo will wait
+for 5 minutes for any dead TabletServer to return before reassigning that TabletServer&#8217;s responsibilities to other TabletServers.
+If the TabletServer returns to the cluster before the specified timeout has elapsed, Accumulo will assign the TabletServer
+its original responsibilities.</p>
+</div>
+<div class="paragraph">
+<p>It is important not to choose too large a value for <code>table.suspend.duration</code>, as during this time, all scans against the
+data that TabletServer had hosted will block (or time out).</p>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_running_multiple_tabletservers_on_a_single_node">18.7.6. Running multiple TabletServers on a single node</h4>
+<div class="paragraph">
+<p>With very powerful nodes, it may be beneficial to run more than one TabletServer on a given
+node. This decision should be made carefully and with much deliberation as Accumulo is designed
+to be able to scale to using 10&#8217;s of GB of RAM and 10&#8217;s of CPU cores.</p>
+</div>
+<div class="paragraph">
+<p>To run multiple TabletServers on a single host you will need to change the <code>NUM_TSERVERS</code> property
+in the accumulo-env.sh file from 1 to the number of TabletServers that you want to run. On NUMA
+hardware, with numactl installed, the TabletServer will interleave its memory allocations across
+the NUMA nodes and the processes will be scheduled on all the NUMA cores without restriction. To
+change this behavior you can uncomment the <code>TSERVER_NUMA_OPTIONS</code> example in accumulo-env.sh and
+set the numactl options for each TabletServer.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from
+other nodes. Running more than one TabletServer on a host requires that you set the following
+properties in <code>accumulo-site.xml</code>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;tserver.port.client&lt;/name&gt;
+  &lt;value&gt;0&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;replication.receipt.service.port&lt;/name&gt;
+  &lt;value&gt;0&lt;/value&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Accumulo&#8217;s provided scripts for starting and stopping the cluster should work normally with multiple
+TabletServers on a host. Sanity checks are provided in the scripts and will output an error when there
+is a configuration mismatch.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="monitoring">18.8. Monitoring</h3>
+<div class="sect3">
+<h4 id="_accumulo_monitor">18.8.1. Accumulo Monitor</h4>
+<div class="paragraph">
+<p>The Accumulo Monitor provides an interface for monitoring the status and health of
+Accumulo components. The Accumulo Monitor provides a web UI for accessing this information at
+<code>http://<em>monitorhost</em>:9995/</code>.</p>
+</div>
+<div class="paragraph">
+<p>Things highlighted in yellow may be in need of attention.
+If anything is highlighted in red on the monitor page, it is something that definitely needs attention.</p>
+</div>
+<div class="paragraph">
+<p>The Overview page contains some summary information about the Accumulo instance, including the version, instance name, and instance ID.
+There is a table labeled Accumulo Master with current status, a table listing the active Zookeeper servers, and graphs displaying various metrics over time.
+These include ingest and scan performance and other useful measurements.</p>
+</div>
+<div class="paragraph">
+<p>The Master Server, Tablet Servers, and Tables pages display metrics grouped in different ways (e.g. by tablet server or by table).
+Metrics typically include number of entries (key/value pairs), ingest and query rates.
+The number of running scans, major and minor compactions are in the form <em>number_running</em> (<em>number_queued</em>).
+Another important metric is hold time, which is the amount of time a tablet has been waiting but unable to flush its memory in a minor compaction.</p>
+</div>
+<div class="paragraph">
+<p>The Server Activity page graphically displays tablet server status, with each server represented as a circle or square.
+Different metrics may be assigned to the nodes' color and speed of oscillation.
+The Overall Avg metric is only used on the Server Activity page, and represents the average of all the other metrics (after normalization).
+Similarly, the Overall Max metric picks the metric with the maximum normalized value.</p>
+</div>
+<div class="paragraph">
+<p>The Garbage Collector page displays a list of garbage collection cycles, the number of files found of each type (including deletion candidates in use and files actually deleted), and the length of the deletion cycle.
+The Traces page displays data for recent traces performed (see the following section for information on <a href="#tracing">Tracing</a>).
+The Recent Logs page displays warning and error logs forwarded to the monitor from all Accumulo processes.
+Also, the XML and JSON links provide metrics in XML and JSON formats, respectively.</p>
+</div>
+<div class="paragraph">
+<p>The Accumulo monitor does a best-effort to not display any sensitive information to users; however,
+the monitor is intended to be a tool used with care. It is not a production-grade webservice. It is
+a good idea to whitelist access to the monitor via an authentication proxy or firewall. It
+is strongly recommended that the Monitor is not exposed to any publicly-accessible networks.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_ssl_2">18.8.2. SSL</h4>
+<div class="paragraph">
+<p>SSL may be enabled for the monitor page by setting the following properties in the <code>accumulo-site.xml</code> file:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>monitor.ssl.keyStore
+monitor.ssl.keyStorePassword
+monitor.ssl.trustStore
+monitor.ssl.trustStorePassword</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If the Accumulo conf directory has been configured (in particular the <code>accumulo-env.sh</code> file must be set up), the <code>generate_monitor_certificate.sh</code> script in the Accumulo <code>bin</code> directory can be used to create the keystore and truststore files with random passwords.
+The script will print out the properties that need to be added to the <code>accumulo-site.xml</code> file.
+The stores can also be generated manually with the Java <code>keytool</code> command, whose usage can be seen in the <code>generate_monitor_certificate.sh</code> script.</p>
+</div>
+<div class="paragraph">
+<p>If desired, the SSL ciphers allowed for connections can be controlled via the following properties in <code>accumulo-site.xml</code>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>monitor.ssl.include.ciphers
+monitor.ssl.exclude.ciphers</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If SSL is enabled, the monitor URL can only be accessed via https.
+This also allows you to access the Accumulo shell through the monitor page.
+The left navigation bar will have a new link to Shell.
+An Accumulo user name and password must be entered for access to the shell.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_metrics">18.9. Metrics</h3>
+<div class="paragraph">
+<p>Accumulo is capable of using the Hadoop Metrics2 library and is configured by default to use it. Metrics2 is a library
+which allows for routing of metrics generated by registered MetricsSources to configured MetricsSinks. Examples of sinks
+that are implemented by Hadoop include file-based logging, Graphite and Ganglia. All metric sources are exposed via JMX
+when using Metrics2.</p>
+</div>
+<div class="paragraph">
+<p>Previous to Accumulo 1.7.0, JMX endpoints could be exposed in addition to file-based logging of those metrics configured via
+the <code>accumulo-metrics.xml</code> file. This mechanism can still be used by setting <code>general.legacy.metrics</code> to <code>true</code> in <code>accumulo-site.xml</code>.</p>
+</div>
+<div class="sect3">
+<h4 id="_metrics2_configuration">18.9.1. Metrics2 Configuration</h4>
+<div class="paragraph">
+<p>Metrics2 is configured by examining the classpath for a file that matches <code>hadoop-metrics2*.properties</code>. The example configuration
+files that Accumulo provides for use include <code>hadoop-metrics2-accumulo.properties</code> as a template which can be used to enable
+file, Graphite or Ganglia sinks (some minimal configuration required for Graphite and Ganglia). Because the Hadoop configuration is
+also on the Accumulo classpath, be sure that you do not have multiple Metrics2 configuration files. It is recommended to consolidate
+metrics in a single properties file in a central location to remove ambiguity. The contents of <code>hadoop-metrics2-accumulo.properties</code>
+can be added to a central <code>hadoop-metrics2.properties</code> in <code>$HADOOP_CONF_DIR</code>.</p>
+</div>
+<div class="paragraph">
+<p>As a note for configuring the file sink, the provided path should be absolute. A relative path or file name will be created relative
+to the directory in which the Accumulo process was started. External tools, such as logrotate, can be used to prevent these files
+from growing without bound.</p>
+</div>
+<div class="paragraph">
+<p>Each server process should have log messages from the Metrics2 library about the sinks that were created. Be sure to check
+the Accumulo processes log files when debugging missing metrics output.</p>
+</div>
+<div class="paragraph">
+<p>For additional information on configuring Metrics2, visit the
+<a href="https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html">Javadoc page for Metrics2</a>.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="tracing">18.10. Tracing</h3>
+<div class="paragraph">
+<p>It can be difficult to determine why some operations are taking longer
+than expected. For example, you may be looking up items with very low
+latency, but sometimes the lookups take much longer. Determining the
+cause of the delay is difficult because the system is distributed, and
+the typical lookup is fast.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo has been instrumented to record the time that various
+operations take when tracing is turned on. The fact that tracing is
+enabled follows all the requests made on behalf of the user throughout
+the distributed infrastructure of accumulo, and across all threads of
+execution.</p>
+</div>
+<div class="paragraph">
+<p>These time spans will be inserted into the <code>trace</code> table in
+Accumulo. You can browse recent traces from the Accumulo monitor
+page. You can also read the <code>trace</code> table directly like any
+other table.</p>
+</div>
+<div class="paragraph">
+<p>The design of Accumulo&#8217;s distributed tracing follows that of
+<a href="http://research.google.com/pubs/pub36356.html">Google&#8217;s Dapper</a>.</p>
+</div>
+<div class="sect3">
+<h4 id="_tracers">18.10.1. Tracers</h4>
+<div class="paragraph">
+<p>To collect traces, Accumulo needs at least one server listed in
+ <code>$ACCUMULO_HOME/conf/tracers</code>. The server collects traces
+from clients and writes them to the <code>trace</code> table. The Accumulo
+user that the tracer connects to Accumulo with can be configured with
+the following properties
+(see the <a href="#configuration">Configuration</a> section for setting Accumulo server properties)</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>trace.user
+trace.token.property.password</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Other tracer configuration properties include</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>trace.port.client - port tracer listens on
+trace.table - table tracer writes to
+trace.zookeeper.path - zookeeper path where tracers register</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The zookeeper path is configured to /tracers by default.  If
+multiple Accumulo instances are sharing the same ZooKeeper
+quorum, take care to configure Accumulo with unique values for
+this property.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_configuring_tracing">18.10.2. Configuring Tracing</h4>
+<div class="paragraph">
+<p>Traces are collected via SpanReceivers. The default SpanReceiver
+configured is org.apache.accumulo.core.trace.ZooTraceClient, which
+sends spans to an Accumulo Tracer process, as discussed in the
+previous section. This default can be changed to a different span
+receiver, or additional span receivers can be added in a
+comma-separated list, by modifying the property</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>trace.span.receivers</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Individual span receivers may require their own configuration
+parameters, which are grouped under the trace.span.receiver.*
+prefix.  ZooTraceClient uses the following properties.  The first
+three properties are populated from other Accumulo properties,
+while the remaining ones should be prefixed with
+trace.span.receiver. when set in the Accumulo configuration.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>tracer.zookeeper.host - populated from instance.zookeepers
+tracer.zookeeper.timeout - populated from instance.zookeeper.timeout
+tracer.zookeeper.path - populated from trace.zookeeper.path
+tracer.send.timer.millis - timer for flushing send queue (in ms, default 1000)
+tracer.queue.size - max queue size (default 5000)
+tracer.span.min.ms - minimum span length to store (in ms, default 1)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Note that to configure an Accumulo client for tracing, including
+the Accumulo shell, the client configuration must be given the same
+trace.span.receivers, trace.span.receiver.*, and trace.zookeeper.path
+properties as the servers have.</p>
+</div>
+<div class="paragraph">
+<p>Hadoop can also be configured to send traces to Accumulo, as of
+Hadoop 2.6.0, by setting properties in Hadoop&#8217;s core-site.xml
+file.  Instead of using the trace.span.receiver.* prefix, Hadoop
+uses hadoop.htrace.*.  The Hadoop configuration does not have
+access to Accumulo&#8217;s properties, so the
+hadoop.htrace.tracer.zookeeper.host property must be specified.
+The zookeeper timeout defaults to 30000 (30 seconds), and the
+zookeeper path defaults to /tracers.  An example of configuring
+Hadoop to send traces to ZooTraceClient is</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;hadoop.htrace.spanreceiver.classes&lt;/name&gt;
+  &lt;value&gt;org.apache.accumulo.core.trace.ZooTraceClient&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hadoop.htrace.tracer.zookeeper.host&lt;/name&gt;
+  &lt;value&gt;zookeeperHost:2181&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hadoop.htrace.tracer.zookeeper.path&lt;/name&gt;
+  &lt;value&gt;/tracers&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hadoop.htrace.tracer.span.min.ms&lt;/name&gt;
+  &lt;value&gt;1&lt;/value&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The accumulo-core, accumulo-tracer, accumulo-fate and libthrift
+jars must also be placed on Hadoop&#8217;s classpath.</p>
+</div>
+<div class="sect4">
+<h5 id="_adding_additional_spanreceivers">Adding additional SpanReceivers</h5>
+<div class="paragraph">
+<p><a href="https://github.com/openzipkin/zipkin">Zipkin</a>
+has a SpanReceiver supported by HTrace and popularized by Twitter
+that users looking for a more graphical trace display may opt to use.
+The following steps configure Accumulo to use <code>org.apache.htrace.impl.ZipkinSpanReceiver</code>
+in addition to the Accumulo&#8217;s default ZooTraceClient, and they serve as a template
+for adding any SpanReceiver to Accumulo:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Add the Jar containing the ZipkinSpanReceiver class file to the
+<code>$ACCUMULO_HOME/lib/</code>.  It is critical that the Jar is placed in
+<code>lib/</code> and NOT in <code>lib/ext/</code> so that the new SpanReceiver class
+is visible to the same class loader of htrace-core.</p>
+</li>
+<li>
+<p>Add the following to <code>$ACCUMULO_HOME/conf/accumulo-site.xml</code>:</p>
+<div class="literalblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;trace.span.receivers&lt;/name&gt;
+  &lt;value&gt;org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver&lt;/value&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Restart your Accumulo tablet servers.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>In order to use ZipkinSpanReceiver from a client as well as the Accumulo server,</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Ensure your client can see the ZipkinSpanReceiver class at runtime. For Maven projects,
+this is easily done by adding to your client&#8217;s pom.xml (taking care to specify a good version)</p>
+<div class="literalblock">
+<div class="content">
+<pre>&lt;dependency&gt;
+  &lt;groupId&gt;org.apache.htrace&lt;/groupId&gt;
+  &lt;artifactId&gt;htrace-zipkin&lt;/artifactId&gt;
+  &lt;version&gt;3.1.0-incubating&lt;/version&gt;
+  &lt;scope&gt;runtime&lt;/scope&gt;
+&lt;/dependency&gt;</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Add the following to your ClientConfiguration
+(see the <a href="#ClientConfiguration">Client Configuration</a> section)</p>
+<div class="literalblock">
+<div class="content">
+<pre>trace.span.receivers=org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Instrument your client as in the next section.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>Your SpanReceiver may require additional properties, and if so these should likewise
+be placed in the ClientConfiguration (if applicable) and Accumulo&#8217;s <code>accumulo-site.xml</code>.
+Two such properties for ZipkinSpanReceiver, listed with their default values, are</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>&lt;property&gt;
+  &lt;name&gt;trace.span.receiver.zipkin.collector-hostname&lt;/name&gt;
+  &lt;value&gt;localhost&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;trace.span.receiver.zipkin.collector-port&lt;/name&gt;
+  &lt;value&gt;9410&lt;/value&gt;
+&lt;/property&gt;</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_instrumenting_a_client">18.10.3. Instrumenting a Client</h4>
+<div class="paragraph">
+<p>Tracing can be used to measure a client operation, such as a scan, as
+the operation traverses the distributed system. To enable tracing for
+your application call</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">import org.apache.accumulo.core.trace.DistributedTrace;
+...
+DistributedTrace.enable(hostname, "myApplication");
+// do some tracing
+...
+DistributedTrace.disable();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Once tracing has been enabled, a client can wrap an operation in a trace.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">import org.apache.htrace.Sampler;
+import org.apache.htrace.Trace;
+import org.apache.htrace.TraceScope;
+...
+TraceScope scope = Trace.startSpan("Client Scan", Sampler.ALWAYS);
+BatchScanner scanner = conn.createBatchScanner(...);
+// Configure your scanner
+for (Entry entry : scanner) {
+}
+scope.close();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The user can create additional Spans within a Trace.</p>
+</div>
+<div class="paragraph">
+<p>The sampler (such as <code>Sampler.ALWAYS</code>) for the trace should only be specified with a top-level span,
+and subsequent spans will be collected depending on whether that first span was sampled.
+Don&#8217;t forget to specify a Sampler at the top-level span
+because the default Sampler only samples when part of a pre-existing trace,
+which will never occur in a client that never specifies a Sampler.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">TraceScope scope = Trace.startSpan("Client Update", Sampler.ALWAYS);
+...
+TraceScope readScope = Trace.startSpan("Read");
+...
+readScope.close();
+...
+TraceScope writeScope = Trace.startSpan("Write");
+...
+writeScope.close();
+scope.close();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace.
+Checking whether currently tracing is necessary when using a sampler other than Sampler.ALWAYS.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">...
+int numberOfEntriesRead = 0;
+TraceScope readScope = Trace.startSpan("Read");
+// Do the read, update the counter
+...
+if (Trace.isTracing)
+  readScope.getSpan().addKVAnnotation("Number of Entries Read".getBytes(StandardCharsets.UTF_8),
+      String.valueOf(numberOfEntriesRead).getBytes(StandardCharsets.UTF_8));</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>It is also possible to add timeline annotations to your spans.
+This associates a string with a given timestamp between the start and stop times for a span.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">...
+writeScope.getSpan().addTimelineAnnotation("Initiating Flush");</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Some client operations may have a high volume within your
+application. As such, you may wish to only sample a percentage of
+operations for tracing. As seen below, the CountSampler can be used to
+help enable tracing for 1-in-1000 operations</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">import org.apache.htrace.impl.CountSampler;
+...
+Sampler sampler = new CountSampler(HTraceConfiguration.fromMap(
+    Collections.singletonMap(CountSampler.SAMPLER_FREQUENCY_CONF_KEY, "1000")));
+...
+TraceScope readScope = Trace.startSpan("Read", sampler);
+...
+readScope.close();</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Remember to close all spans and disable tracing when finished.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-java hljs" data-lang="java">DistributedTrace.disable();</code></pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_viewing_collected_traces">18.10.4. Viewing Collected Traces</h4>
+<div class="paragraph">
+<p>To view collected traces, use the "Recent Traces" link on the Monitor
+UI. You can also programmatically access and print traces using the
+<code>TraceDump</code> class.</p>
+</div>
+<div class="sect4">
+<h5 id="_trace_table_format">Trace Table Format</h5>
+<div class="paragraph">
+<p>This section is for developers looking to use data recorded in the trace table
+directly, above and beyond the default services of the Accumulo monitor.
+Please note the trace table format and its supporting classes
+are not in the public API and may be subject to change in future versions.</p>
+</div>
+<div class="paragraph">
+<p>Each span received by a tracer&#8217;s ZooTraceClient is recorded in the trace table
+in the form of three entries: span entries, index entries, and start time entries.
+Span and start time entries record full span information,
+whereas index entries provide indexing into span information
+useful for quickly finding spans by type or start time.</p>
+</div>
+<div class="paragraph">
+<p>Each entry is illustrated by a description and sample of data.
+In the description, a token in quotes is a String literal,
+whereas other other tokens are span variables.
+Parentheses group parts together, to distinguish colon characters inside the
+column family or qualifier from the colon that separates column family and qualifier.
+We use the format <code>row columnFamily:columnQualifier columnVisibility    value</code>
+(omitting timestamp which records the time an entry is written to the trace table).</p>
+</div>
+<div class="paragraph">
+<p>Span entries take the following form:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>traceId        "span":(parentSpanId:spanId)            []    spanBinaryEncoding
+63b318de80de96d1 span:4b8f66077df89de1:3778c6739afe4e1 []    %18;%09;...</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The parentSpanId is "" for the root span of a trace.
+The spanBinaryEncoding is a compact Apache Thrift encoding of the original Span object.
+This allows clients (and the Accumulo monitor) to recover all the details of the original Span
+at a later time, by scanning the trace table and decoding the value of span entries
+via <code>TraceFormatter.getRemoteSpan(entry)</code>.</p>
+</div>
+<div class="paragraph">
+<p>The trace table has a formatter class by default (org.apache.accumulo.tracer.TraceFormatter)
+that changes how span entries appear from the Accumulo shell.
+Normal scans to the trace table do not use this formatter representation;
+it exists only to make span entries easier to view inside the Accumulo shell.</p>
+</div>
+<div class="paragraph">
+<p>Index entries take the following form:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>"idx":service:startTime description:sender  []    traceId:elapsedTime
+idx:tserver:14f3828f58b startScan:localhost []    63b318de80de96d1:1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The service and sender are set by the first call of each Accumulo process
+(and instrumented client processes) to <code>DistributedTrace.enable(&#8230;&#8203;)</code>
+(the sender is autodetected if not specified).
+The description is specified in each span.
+Start time and the elapsed time (start - stop, 1 millisecond in the example above)
+are recorded in milliseconds as long values serialized to a string in hex.</p>
+</div>
+<div class="paragraph">
+<p>Start time entries take the following form:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>"start":startTime "id":traceId        []    spanBinaryEncoding
+start:14f3828a351 id:63b318de80de96d1 []    %18;%09;...</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The following classes may be run from $ACCUMULO_HOME while Accumulo is running
+to provide insight into trace statistics. These require
+accumulo-trace-VERSION.jar to be provided on the Accumulo classpath
+(<code>$ACCUMULO_HOME/lib/ext</code> is fine).</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ bin/accumulo org.apache.accumulo.tracer.TraceTableStats -u username -p password -i instancename
+$ bin/accumulo org.apache.accumulo.tracer.TraceDump -u username -p password -i instancename -r</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_tracing_from_the_shell">18.10.5. Tracing from the Shell</h4>
+<div class="paragraph">
+<p>You can enable tracing for operations run from the shell by using the
+<code>trace on</code> and <code>trace off</code> commands.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>root@test test&gt; trace on
+
+root@test test&gt; scan
+a b:c []    d
+
+root@test test&gt; trace off
+Waiting for trace information
+Waiting for trace information
+Trace started at 2013/08/26 13:24:08.332
+Time  Start  Service@Location       Name
+ 3628+0      shell@localhost shell:root
+    8+1690     shell@localhost scan
+    7+1691       shell@localhost scan:location
+    6+1692         tserver@localhost startScan
+    5+1692           tserver@localhost tablet read ahead 6</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_logging">18.11. Logging</h3>
+<div class="paragraph">
+<p>Accumulo processes each write to a set of log files. By default these are found under
+<code>$ACCUMULO/logs/</code>.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="watcher">18.12. Watcher</h3>
+<div class="paragraph">
+<p>Accumulo includes scripts to automatically restart server processes in the case
+of intermittent failures. To enable this watcher, edit <code>conf/accumulo-env.sh</code>
+to include the following:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre># Should process be automatically restarted
+export ACCUMULO_WATCHER="true"
+
+# What settings should we use for the watcher, if enabled
+export UNEXPECTED_TIMESPAN="3600"
+export UNEXPECTED_RETRIES="2"
+
+export OOM_TIMESPAN="3600"
+export OOM_RETRIES="5"
+
+export ZKLOCK_TIMESPAN="600"
+export ZKLOCK_RETRIES="5"</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When an Accumulo process dies, the watcher will look at the logs and exit codes
+to determine how the process failed and either restart or fail depending on the
+recent history of failures. The restarting policy for various failure conditions
+is configurable through the <code>*_TIMESPAN</code> and <code>*_RETRIES</code> variables shown above.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_recovery">18.13. Recovery</h3>
+<div class="paragraph">
+<p>In the event of TabletServer failure or error on shutting Accumulo down, some
+mutations may not have been minor compacted to HDFS properly. In this case,
+Accumulo will automatically reapply such mutations from the write-ahead log
+either when the tablets from the failed server are reassigned by the Master (in the
+case of a single TabletServer failure) or the next time Accumulo starts (in the event of
+failure during shutdown).</p>
+</div>
+<div class="paragraph">
+<p>Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing
+updates. The sort status of each file is displayed on
+Accumulo monitor status page. Once the recovery is complete any
+tablets involved should return to an &#8220;online&#8221; state. Until then those tablets will be
+unavailable to clients.</p>
+</div>
+<div class="paragraph">
+<p>The Accumulo client library is configured to retry failed mutations and in many
+cases clients will be able to continue processing after the recovery process without
+throwing an exception.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_migrating_accumulo_from_non_ha_namenode_to_ha_namenode">18.14. Migrating Accumulo from non-HA Namenode to HA Namenode</h3>
+<div class="paragraph">
+<p>The following steps will allow a non-HA instance to be migrated to an HA instance. Consider an HDFS URL
+<code>hdfs://namenode.example.com:8020</code> which is going to be moved to <code>hdfs://nameservice1</code>.</p>
+</div>
+<div class="paragraph">
+<p>Before moving HDFS over to the HA namenode, use <code>$ACCUMULO_HOME/bin/accumulo admin volumes</code> to confirm
+that the only volume displayed is the volume from the current namenode&#8217;s HDFS URL.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Listing volumes referenced in zookeeper
+        Volume : hdfs://namenode.example.com:8020/accumulo</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Listing volumes referenced in accumulo.root tablets section
+        Volume : hdfs://namenode.example.com:8020/accumulo
+Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time)</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Listing volumes referenced in accumulo.metadata tablets section
+        Volume : hdfs://namenode.example.com:8020/accumulo</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>After verifying the current volume is correct, shut down the cluster and transition HDFS to the HA nameservice.</p>
+</div>
+<div class="paragraph">
+<p>Edit <code>$ACCUMULO_HOME/conf/accumulo-site.xml</code> to notify accumulo that a volume is being replaced. First,
+add the new nameservice volume to the <code>instance.volumes</code> property. Next, add the
+<code>instance.volumes.replacements</code> property in the form of <code>old new</code>. It&#8217;s important to not include
+the volume that&#8217;s being replaced in <code>instance.volumes</code>, otherwise it&#8217;s possible accumulo could continue
+to write to the volume.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;!-- instance.dfs.uri and instance.dfs.dir should not be set--&gt;
+&lt;property&gt;
+  &lt;name&gt;instance.volumes&lt;/name&gt;
+  &lt;value&gt;hdfs://nameservice1/accumulo&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;instance.volumes.replacements&lt;/name&gt;
+  &lt;value&gt;hdfs://namenode.example.com:8020/accumulo hdfs://nameservice1/accumulo&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Run <code>$ACCUMULO_HOME/bin/accumulo init --add-volumes</code> and start up the accumulo cluster. Verify that the
+new nameservice volume shows up with <code>$ACCUMULO_HOME/bin/accumulo admin volumes</code>.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Listing volumes referenced in zookeeper
+        Volume : hdfs://namenode.example.com:8020/accumulo
+        Volume : hdfs://nameservice1/accumulo</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Listing volumes referenced in accumulo.root tablets section
+        Volume : hdfs://namenode.example.com:8020/accumulo
+        Volume : hdfs://nameservice1/accumulo
+Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time)</pre>
+</div>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Listing volumes referenced in accumulo.metadata tablets section
+        Volume : hdfs://namenode.example.com:8020/accumulo
+        Volume : hdfs://nameservice1/accumulo
+Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Some erroneous GarbageCollector messages may still be seen for a small period while data is transitioning to
+the new volumes. This is expected and can usually be ignored.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_achieving_stability_in_a_vm_environment">18.15. Achieving Stability in a VM Environment</h3>
+<div class="paragraph">
+<p>For testing, demonstration, and even operation uses, Accumulo is often
+installed and run in a virtual machine (VM) environment. The majority of
+long-term operational uses of Accumulo are on bare-metal cluster. However, the
+core design of Accumulo and its dependencies do not preclude running stably for
+long periods within a VM. Many of Accumulo’s operational robustness features to
+handle failures like periodic network partitioning in a large cluster carry
+over well to VM environments. This guide covers general recommendations for
+maximizing stability in a VM environment, including some of the common failure
+modes that are more common when running in VMs.</p>
+</div>
+<div class="sect3">
+<h4 id="_known_failure_modes_setup_and_troubleshooting">18.15.1. Known failure modes: Setup and Troubleshooting</h4>
+<div class="paragraph">
+<p>In addition to the general failure modes of running Accumulo, VMs can introduce a
+couple of environmental challenges that can affect process stability. Clock
+drift is something that is more common in VMs, especially when VMs are
+suspended and resumed. Clock drift can cause Accumulo servers to assume that
+they have lost connectivity to the other Accumulo processes and/or lose their
+locks in Zookeeper. VM environments also frequently have constrained resources,
+such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally
+deals well with constrained resources from a stability perspective (optimizing
+performance will require additional tuning, which is not covered in this
+section), however there are some limits.</p>
+</div>
+<div class="sect4">
+<h5 id="_physical_memory">Physical Memory</h5>
+<div class="paragraph">
+<p>One of those limits has to do with the Linux out of memory killer. A common
+failure mode in VM environments (and in some bare metal installations) is when
+the Linux out of memory killer decides to kill processes in order to avoid a
+kernel panic when provisioning a memory page. This often happens in VMs due to
+the large number of processes that must run in a small memory footprint. In
+addition to the Linux core processes, a single-node Accumulo setup requires a
+Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper
+server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer.
+Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop
+ResourceManager, a Hadoop NodeManager, provisioning software, and client
+applications. Between all of these processes, it is not uncommon to
+over-subscribe the available RAM in a VM. We recommend setting up VMs without
+swap enabled, so rather than performance grinding to a halt when physical
+memory is exhausted the kernel will randomly* select processes to kill in order
+to free up memory.</p>
+</div>
+<div class="paragraph">
+<p>Calculating the maximum possible memory usage is essential in creating a stable
+Accumulo VM setup. Safely engineering memory allocation for stability is a
+matter of then bringing the calculated maximum memory usage under the physical
+memory by a healthy margin. The margin is to account for operating system-level
+operations, such as managing process, maintaining virtual memory pages, and
+file system caching. When the java out-of-memory killer finds your process, you
+will probably only see evidence of that in /var/log/messages. Out-of-memory
+process kills do not show up in Accumulo or Hadoop logs.</p>
+</div>
+<div class="paragraph">
+<p>To calculate the max memory usage of all java virtual machine (JVM) processes
+add the maximum heap size (often limited by a -Xmx&#8230;&#8203; argument, such as in
+accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage
+includes the following:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>"Permanent Space", where the JVM stores Classes, Methods, and other code elements. This can be limited by a JVM flag such as <code>-XX:MaxPermSize:100m</code>, and is typically tens of megabytes.</p>
+</li>
+<li>
+<p>Code generation space, where the JVM stores just-in-time compiled code. This is typically small enough to ignore</p>
+</li>
+<li>
+<p>Socket buffers, where the JVM stores send and receive buffers for each socket.</p>
+</li>
+<li>
+<p>Thread stacks, where the JVM allocates memory to manage each thread.</p>
+</li>
+<li>
+<p>Direct memory space and JNI code, where applications can allocate memory outside of the JVM-managed space. For Accumulo, this includes the native in-memory maps that are allocated with the memory.maps.max parameter in accumulo-site.xml.</p>
+</li>
+<li>
+<p>Garbage collection space, where the JVM stores information used for garbage collection.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>You can assume that each Hadoop and Accumulo process will use ~100-150MB for
+Off-heap memory, plus the in-memory map of the Accumulo TServer process. A
+simple calculation for physical memory requirements follows:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  Physical memory needed
+    = (per-process off-heap memory) + (heap memory) + (other processes) + (margin)
+    = (number of java processes * 150M + native map) + (sum of -Xmx settings for java process) + (total applications memory, provisioning memory, etc.) + (1G)
+    = (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M +512M) + (2G) + (1G)
+    = (2150M) + (7G) + (2G) + (1G)
+    = ~12GB</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>These calculations can add up quickly with the large number of processes,
+especially in constrained VM environments. To reduce the physical memory
+requirements, it is a good idea to reduce maximum heap limits and turn off
+unnecessary processes. If you&#8217;re not using YARN in your application, you can
+turn off the ResourceManager and NodeManager. If you&#8217;re not expecting to
+re-provision the cluster frequently you can turn off or reduce provisioning
+processes such as Salt Stack minions and masters.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_disk_space">Disk Space</h5>
+<div class="paragraph">
+<p>Disk space is primarily used for two operations: storing data and storing logs.
+While Accumulo generally stores all of its key/value data in HDFS, Accumulo,
+Hadoop, and Zookeeper all store a significant amount of logs in a directory on
+a local file system. Care should be taken to make sure that (a) limitations to
+the amount of logs generated are in place, and (b) enough space is available to
+host the generated logs on the partitions that they are assigned. When space is
+not available to log, processes will hang. This can cause interruptions in
+availability of Accumulo, as well as cascade into failures of various
+processes.</p>
+</div>
+<div class="paragraph">
+<p>Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of
+them has a way of limiting the logs and directing them to a particular
+directory. Logs are generated independently for each process, so when
+considering the total space you need to add up the maximum logs generated by
+each process. Typically, a rolling log setup in which each process can generate
+something like 10 100MB files is instituted, resulting in a maximum file system
+usage of 1GB per process. Default setups for Hadoop and Zookeeper are often
+unbounded, so it is important to set these limits in the logging configuration
+files for each subsystem. Consult the user manual for each system for
+instructions on how to limit generated logs.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_zookeeper_interaction">Zookeeper Interaction</h5>
+<div class="paragraph">
+<p>Accumulo is designed to scale up to thousands of nodes. At that scale,
+intermittent interruptions in network service and other rare failures of
+compute nodes become more common. To limit the impact of node failures on
+overall service availability, Accumulo uses a heartbeat monitoring system that
+leverages Zookeeper&#8217;s ephemeral locks. There are several conditions that can
+occur that cause Accumulo process to lose their Zookeeper locks, some of which
+are true interruptions to availability and some of which are false positives.
+Several of these conditions become more common in VM environments, where they
+can be exacerbated by resource constraints and clock drift.</p>
+</div>
+<div class="paragraph">
+<p>Accumulo includes a mechanism to limit the impact of the false positives known
+as the <a href="#watcher">Watcher</a>. The watcher monitors Accumulo processes and will restart
+them when they fail for certain reasons. The watcher can be configured within
+the accumulo-env.sh file inside of Accumulo&#8217;s configuration directory. We
+recommend using the watcher to monitor Accumulo processes, as it will restore
+the system to full capacity without administrator interaction after many of the
+common failure modes.</p>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_tested_versions">18.15.2. Tested Versions</h4>
+<div class="paragraph">
+<p>Each release of Accumulo is built with a specific version of Apache
+Hadoop, Apache ZooKeeper and Apache Thrift.  We expect Accumulo to
+work with versions that are API compatable with those versions.
+However this compatibility is not guaranteed because Hadoop, ZooKeeper
+and Thift may not provide guarantees between their own versions. We
+have also found that certain versions of Accumulo and Hadoop included
+bugs that greatly affected overall stability.  Thrift is particularly
+prone to compatablity changes between versions and you must use the
+same version your Accumulo is built with.</p>
+</div>
+<div class="paragraph">
+<p>Please check the release notes for your Accumulo version or use the
+mailing lists at <a href="https://accumulo.apache.org" class="bare">https://accumulo.apache.org</a> for more info.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_multi_volume_installations">19. Multi-Volume Installations</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This is an advanced configuration setting for very large clusters
+under a lot of write pressure.</p>
+</div>
+<div class="paragraph">
+<p>The HDFS NameNode holds all of the metadata about the files in
+HDFS. For fast performance, all of this information needs to be stored
+in memory.  A single NameNode with 64G of memory can store the
+metadata for tens of millions of files.However, when scaling beyond a
+thousand nodes, an active Accumulo system can generate lots of updates
+to the file system, especially when data is being ingested.  The large
+number of write transactions to the NameNode, and the speed of a
+single edit log, can become the limiting factor for large scale
+Accumulo installations.</p>
+</div>
+<div class="paragraph">
+<p>You can see the effect of slow write transactions when the Accumulo
+Garbage Collector takes a long time (more than 5 minutes) to delete
+the files Accumulo no longer needs.  If your Garbage Collector
+routinely runs in less than a minute, the NameNode is performing well.</p>
+</div>
+<div class="paragraph">
+<p>However, if you do begin to experience slow-down and poor GC
+performance, Accumulo can be configured to use multiple NameNode
+servers.  The configuration &#8220;instance.volumes&#8221; should be set to a
+comma-separated list, using full URI references to different NameNode
+servers:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+    &lt;name&gt;instance.volumes&lt;/name&gt;
+    &lt;value&gt;hdfs://ns1:9001,hdfs://ns2:9001&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The introduction of multiple volume support in 1.6 changed the way Accumulo
+stores pointers to files.  It now stores fully qualified URI references to
+files.  Before 1.6, Accumulo stored paths that were relative to a table
+directory.   After an upgrade these relative paths will still exist and are
+resolved using instance.dfs.dir, instance.dfs.uri, and Hadoop configuration in
+the same way they were before 1.6.</p>
+</div>
+<div class="paragraph">
+<p>If the URI for a namenode changes (e.g. namenode was running on host1 and its
+moved to host2), then Accumulo will no longer function.  Even if Hadoop and
+Accumulo configurations are changed, the fully qualified URIs stored in
+Accumulo will still contain the old URI.  To handle this Accumulo has the
+following configuration property for replacing URI stored in its metadata.  The
+example configuration below will replace ns1 with nsA and ns2 with nsB in
+Accumulo metadata.  For this property to take affect, Accumulo will need to be
+restarted.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+    &lt;name&gt;instance.volumes.replacements&lt;/name&gt;
+    &lt;value&gt;hdfs://ns1:9001 hdfs://nsA:9001, hdfs://ns2:9001 hdfs://nsB:9001&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Using viewfs or HA namenode, introduced in Hadoop 2, offers another option for
+managing the fully qualified URIs stored in Accumulo.  Viewfs and HA namenode
+both introduce a level of indirection in the Hadoop configuration.   For
+example assume viewfs:///nn1 maps to hdfs://nn1 in the Hadoop configuration.
+If viewfs://nn1 is used by Accumulo, then its easy to map viewfs://nn1 to
+hdfs://nnA by changing the Hadoop configuration w/o doing anything to Accumulo.
+A production system should probably use a HA namenode.  Viewfs may be useful on
+a test system with a single non HA namenode.</p>
+</div>
+<div class="paragraph">
+<p>You may also want to configure your cluster to use Federation,
+available in Hadoop 2.0, which allows DataNodes to respond to multiple
+NameNode servers, so you do not have to partition your DataNodes by
+NameNode.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_troubleshooting">20. Troubleshooting</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_logs">20.1. Logs</h3>
+<div class="paragraph">
+<p><strong>Q</strong>: The tablet server does not seem to be running!? What happened?</p>
+</div>
+<div class="paragraph">
+<p>Accumulo is a distributed system.  It is supposed to run on remote
+equipment, across hundreds of computers.  Each program that runs on
+these remote computers writes down events as they occur, into a local
+file. By default, this is defined in
+<code>$ACCUMULO_HOME/conf/accumule-env.sh</code> as <code>ACCUMULO_LOG_DIR</code>.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Look in the <code>$ACCUMULO_LOG_DIR/tserver*.log</code> file.  Specifically, check the end of the file.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: The tablet server did not start and the debug log does not exists!  What happened?</p>
+</div>
+<div class="paragraph">
+<p>When the individual programs are started, the stdout and stderr output
+of these programs are stored in <code>.out</code> and <code>.err</code> files in
+<code>$ACCUMULO_LOG_DIR</code>.  Often, when there are missing configuration
+options, files or permissions, messages will be left in these files.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Probably a start-up problem.  Look in <code>$ACCUMULO_LOG_DIR/tserver*.err</code></p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_monitor_2">20.2. Monitor</h3>
+<div class="paragraph">
+<p><strong>Q</strong>: Accumulo is not working, what&#8217;s wrong?</p>
+</div>
+<div class="paragraph">
+<p>There&#8217;s a small web server that collects information about all the
+components that make up a running Accumulo instance. It will highlight
+unusual or unexpected conditions.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Point your browser to the monitor (typically the master host, on port 9995).  Is anything red or yellow?</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My browser is reporting connection refused, and I cannot get to the monitor</p>
+</div>
+<div class="paragraph">
+<p>The monitor program&#8217;s output is also written to .err and .out files in
+the <code>$ACCUMULO_LOG_DIR</code>. Look for problems in this file if the
+<code>$ACCUMULO_LOG_DIR/monitor*.log</code> file does not exist.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: The monitor program is probably not running.  Check the log files for errors.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My browser hangs trying to talk to the monitor.</p>
+</div>
+<div class="paragraph">
+<p>Your browser needs to be able to reach the monitor program.  Often
+large clusters are firewalled, or use a VPN for internal
+communications. You can use SSH to proxy your browser to the cluster,
+or consult with your system administrator to gain access to the server
+from your browser.</p>
+</div>
+<div class="paragraph">
+<p>It is sometimes helpful to use a text-only browser to sanity-check the
+monitor while on the machine running the monitor:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ links http://localhost:9995</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Verify that you are not firewalled from the monitor if it is running on a remote host.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: The monitor responds, but there are no numbers for tservers and tables.  The summary page says the master is down.</p>
+</div>
+<div class="paragraph">
+<p>The monitor program gathers all the details about the master and the
+tablet servers through the master. It will be mostly blank if the
+master is down.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Check for a running master.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_hdfs">20.3. HDFS</h3>
+<div class="paragraph">
+<p>Accumulo reads and writes to the Hadoop Distributed File System.
+Accumulo needs this file system available at all times for normal operations.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: Accumulo is having problems &#8220;getting a block blk_1234567890123.&#8221; How do I fix it?</p>
+</div>
+<div class="paragraph">
+<p>This troubleshooting guide does not cover HDFS, but in general, you
+want to make sure that all the datanodes are running and an fsck check
+finds the file system clean:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ hadoop fsck /accumulo</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>You can use:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ hadoop fsck /accumulo/path/to/corrupt/file -locations -blocks -files</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>to locate the block references of individual corrupt files and use those
+references to search the name node and individual data node logs to determine which
+servers those blocks have been assigned and then try to fix any underlying file
+system issues on those nodes.</p>
+</div>
+<div class="paragraph">
+<p>On a larger cluster, you may need to increase the number of Xcievers for HDFS DataNodes:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlightjs highlight"><code class="language-xml hljs" data-lang="xml">&lt;property&gt;
+    &lt;name&gt;dfs.datanode.max.xcievers&lt;/name&gt;
+    &lt;value&gt;4096&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Verify HDFS is healthy, check the datanode logs.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_zookeeper">20.4. Zookeeper</h3>
+<div class="paragraph">
+<p><strong>Q</strong>: <code>accumulo init</code> is hanging.  It says something about talking to zookeeper.</p>
+</div>
+<div class="paragraph">
+<p>Zookeeper is also a distributed service.  You will need to ensure that
+it is up.  You can run the zookeeper command line tool to connect to
+any one of the zookeeper servers:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ zkCli.sh -server zoohost
+...
+[zk: zoohost:2181(CONNECTED) 0]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>It is important to see the word <code>CONNECTED</code>!  If you only see
+<code>CONNECTING</code> you will need to diagnose zookeeper errors.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Check to make sure that zookeeper is up, and that
+<code>$ACCUMULO_HOME/conf/accumulo-site.xml</code> has been pointed to
+your zookeeper server(s).</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: Zookeeper is running, but it does not say <code>CONNECTED</code></p>
+</div>
+<div class="paragraph">
+<p>Zookeeper processes talk to each other to elect a leader.  All updates
+go through the leader and propagate to a majority of all the other
+nodes.  If a majority of the nodes cannot be reached, zookeeper will
+not allow updates.  Zookeeper also limits the number connections to a
+server from any other single host.  By default, this limit can be as small as 10
+and can be reached in some everything-on-one-machine test configurations.</p>
+</div>
+<div class="paragraph">
+<p>You can check the election status and connection status of clients by
+asking the zookeeper nodes for their status.  You connect to zookeeper
+and ask it with the four-letter <code>stat</code> command:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ nc zoohost 2181
+stat
+Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
+Clients:
+ /127.0.0.1:58289[0](queued=0,recved=1,sent=0)
+ /127.0.0.1:60231[1](queued=0,recved=53910,sent=53915)
+
+Latency min/avg/max: 0/5/3008
+Received: 1561459
+Sent: 1561592
+Connections: 2
+Outstanding: 0
+Zxid: 0x621a3b
+Mode: standalone
+Node count: 22524</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Check zookeeper status, verify that it has a quorum, and has not exceeded maxClientCnxns.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My tablet server crashed!  The logs say that it lost its zookeeper lock.</p>
+</div>
+<div class="paragraph">
+<p>Tablet servers reserve a lock in zookeeper to maintain their ownership
+over the tablets that have been assigned to them.  Part of their
+responsibility for keeping the lock is to send zookeeper a keep-alive
+message periodically.  If the tablet server fails to send a message in
+a timely fashion, zookeeper will remove the lock and notify the tablet
+server.  If the tablet server does not receive a message from
+zookeeper, it will assume its lock has been lost, too.  If a tablet
+server loses its lock, it kills itself: everything assumes it is dead
+already.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Investigate why the tablet server did not send a timely message to
+zookeeper.</p>
+</div>
+<div class="sect3">
+<h4 id="_keeping_the_tablet_server_lock">20.4.1. Keeping the tablet server lock</h4>
+<div class="paragraph">
+<p><strong>Q</strong>: My tablet server lost its lock.  Why?</p>
+</div>
+<div class="paragraph">
+<p>The primary reason a tablet server loses its lock is that it has been pushed into swap.</p>
+</div>
+<div class="paragraph">
+<p>A large java program (like the tablet server) may have a large portion
+of its memory image unused.  The operation system will favor pushing
+this allocated, but unused memory into swap so that the memory can be
+re-used as a disk buffer.  When the java virtual machine decides to
+access this memory, the OS will begin flushing disk buffers to return that
+memory to the VM.  This can cause the entire process to block long
+enough for the zookeeper lock to be lost.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Configure your system to reduce the kernel parameter <em>swappiness</em> from the default (60) to zero.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My tablet server lost its lock, and I have already set swappiness to
+zero.  Why?</p>
+</div>
+<div class="paragraph">
+<p>Be careful not to over-subscribe memory.  This can be easy to do if
+your accumulo processes run on the same nodes as hadoop&#8217;s map-reduce
+framework.  Remember to add up:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>size of the JVM for the tablet server</p>
+</li>
+<li>
+<p>size of the in-memory map, if using the native map implementation</p>
+</li>
+<li>
+<p>size of the JVM for the data node</p>
+</li>
+<li>
+<p>size of the JVM for the task tracker</p>
+</li>
+<li>
+<p>size of the JVM times the maximum number of mappers and reducers</p>
+</li>
+<li>
+<p>size of the kernel and any support processes</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If a 16G node can run 2 mappers and 2 reducers, and each can be 2G,
+then there is only 8G for the data node, tserver, task tracker and OS.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Reduce the memory footprint of each component until it fits comfortably.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My tablet server lost its lock, swappiness is zero, and my node has lots of unused memory!</p>
+</div>
+<div class="paragraph">
+<p>The JVM memory garbage collector may fall behind and cause a
+"stop-the-world" garbage collection. On a large memory virtual
+machine, this collection can take a long time.  This happens more
+frequently when the JVM is getting low on free memory.  Check the logs
+of the tablet server.  You will see lines like this:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>2013-06-20 13:43:20,607 [tabletserver.TabletServer] DEBUG: gc ParNew=0.00(+0.00) secs
+    ConcurrentMarkSweep=0.00(+0.00) secs freemem=1,868,325,952(+1,868,325,952) totalmem=2,040,135,680</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When <code>freemem</code> becomes small relative to the amount of memory
+needed, the JVM will spend more time finding free memory than
+performing work.  This can cause long delays in sending keep-alive
+messages to zookeeper.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Ensure the tablet server JVM is not running low on memory.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: I&#8217;m seeing errors in tablet server logs that include the words "MutationsRejectedException" and "# constraint violations: 1". Moments after that the server died.</p>
+</div>
+<div class="paragraph">
+<p>The error you are seeing is part of a failing tablet server scenario.
+This is a bit complicated, so name two of your tablet servers A and B.</p>
+</div>
+<div class="paragraph">
+<p>Tablet server A is hosting a tablet, let&#8217;s call it a-tablet.</p>
+</div>
+<div class="paragraph">
+<p>Tablet server B is hosting a metadata tablet, let&#8217;s call it m-tablet.</p>
+</div>
+<div class="paragraph">
+<p>m-tablet records the information about a-tablet, for example, the names of the files it is using to store data.</p>
+</div>
+<div class="paragraph">
+<p>When A ingests some data, it eventually flushes the updates from memory to a file.</p>
+</div>
+<div class="paragraph">
+<p>Tablet server A then writes this new information to m-tablet, on Tablet server B.</p>
+</div>
+<div class="paragraph">
+<p>Here&#8217;s a likely failure scenario:</p>
+</div>
+<div class="paragraph">
+<p>Tablet server A does not have enough memory for all the processes running on it.
+The operating system sees a large chunk of the tablet server being unused, and swaps it out to disk to make room for other processes.
+Tablet server A does a java memory garbage collection, which causes it to start using all the memory allocated to it.
+As the server starts pulling data from swap, it runs very slowly.
+It fails to send the keep-alive messages to zookeeper in a timely fashion, and it looses its zookeeper session.</p>
+</div>
+<div class="paragraph">
+<p>But, it&#8217;s running so slowly, that it takes a moment to realize it should no longer be hosting tablets.</p>
+</div>
+<div class="paragraph">
+<p>The thread that is flushing a-tablet memory attempts to update m-tablet with the new file information.</p>
+</div>
+<div class="paragraph">
+<p>Fortunately there&#8217;s a constraint on m-tablet.
+Mutations to the metadata table must contain a valid zookeeper session.
+This prevents tablet server A from making updates to m-tablet when it no long has the right to host the tablet.</p>
+</div>
+<div class="paragraph">
+<p>The "MutationsRejectedException" error is from tablet server A making an update to tablet server B&#8217;s m-tablet.
+It&#8217;s getting a constraint violation: tablet server A has lost its zookeeper session, and will fail momentarily.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Ensure that memory is not over-allocated.  Monitor swap usage, or turn swap off.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My accumulo client is getting a MutationsRejectedException. The monitor is displaying "No Such SessionID" errors.</p>
+</div>
+<div class="paragraph">
+<p>When your client starts sending mutations to accumulo, it creates a session. Once the session is created,
+mutations are streamed to accumulo, without acknowledgement, against this session.  Once the client is done,
+it will close the session, and get an acknowledgement.</p>
+</div>
+<div class="paragraph">
+<p>If the client fails to communicate with accumulo, it will release the session, assuming that the client has died.
+If the client then attempts to send more mutations against the session, you will see "No Such SessionID" errors on
+the server, and MutationRejectedExceptions in the client.</p>
+</div>
+<div class="paragraph">
+<p>The client library should be either actively using the connection to the tablet servers,
+or closing the connection and sessions. If the session times out, something is causing your client
+to pause.</p>
+</div>
+<div class="paragraph">
+<p>The most frequent source of these pauses are java garbage collection pauses
+due to the JVM running out of memory, or being swapped out to disk.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Ensure your client has adequate memory and is not being swapped out to disk.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_tools">20.5. Tools</h3>
+<div class="paragraph">
+<p>The accumulo script can be used to run various tools and classes from the command line.
+This section shows how a few of the utilities work, but there are many
+more.</p>
+</div>
+<div class="paragraph">
+<p>There&#8217;s a command, <code>rfile-info</code>, that will examine an accumulo storage file and print
+out basic metadata.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./bin/accumulo rfile-info /accumulo/tables/1/default_tablet/A000000n.rf
+2013-07-16 08:17:14,778 [util.NativeCodeLoader] INFO : Loaded the native-hadoop library
+Locality group         : &lt;DEFAULT&gt;
+        Start block          : 0
+        Num   blocks         : 1
+        Index level 0        : 62 bytes  1 blocks
+        First key            : 288be9ab4052fe9e span:34078a86a723e5d3:3da450f02108ced5 [] 1373373521623 false
+        Last key             : start:13fc375709e id:615f5ee2dd822d7a [] 1373373821660 false
+        Num entries          : 466
+        Column families      : [waitForCommits, start, md major compactor 1, md major compactor 2, md major compactor 3,
+                                 bringOnline, prep, md major compactor 4, md major compactor 5, md root major compactor 3,
+                                 minorCompaction, wal, compactFiles, md root major compactor 4, md root major compactor 1,
+                                 md root major compactor 2, compact, id, client:update, span, update, commit, write,
+                                 majorCompaction]
+
+Meta block     : BCFile.index
+      Raw size             : 4 bytes
+      Compressed size      : 12 bytes
+      Compression type     : gz
+
+Meta block     : RFile.index
+      Raw size             : 780 bytes
+      Compressed size      : 344 bytes
+      Compression type     : gz</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When trying to diagnose problems related to key size, the <code>rfile-info</code> command can provide a histogram of the individual key sizes:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo rfile-info --histogram /accumulo/tables/1/default_tablet/A000000n.rf
+...
+Up to size      count      %-age
+         10 :        222  28.23%
+        100 :        244  71.77%
+       1000 :          0   0.00%
+      10000 :          0   0.00%
+     100000 :          0   0.00%
+    1000000 :          0   0.00%
+   10000000 :          0   0.00%
+  100000000 :          0   0.00%
+ 1000000000 :          0   0.00%
+10000000000 :          0   0.00%</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Likewise, <code>rfile-info</code> will dump the key-value pairs and show you the contents of the RFile:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo rfile-info --dump /accumulo/tables/1/default_tablet/A000000n.rf
+row columnFamily:columnQualifier [visibility] timestamp deleteFlag -&gt; Value
+...</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: Accumulo is not showing me any data!</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Do you have your auths set so that it matches your visibilities?</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: What are my visibilities?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Use <code>rfile-info</code> on a representative file to get some idea of the visibilities in the underlying data.</p>
+</div>
+<div class="paragraph">
+<p>Note that the use of <code>rfile-info</code> is an administrative tool and can only
+by used by someone who can access the underlying Accumulo data. It
+does not provide the normal access controls in Accumulo.</p>
+</div>
+<div class="paragraph">
+<p>If you would like to backup, or otherwise examine the contents of Zookeeper, there are commands to dump and load to/from XML.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.server.util.DumpZookeeper --root /accumulo &gt;dump.xml
+$ ./bin/accumulo org.apache.accumulo.server.util.RestoreZookeeper --overwrite &lt; dump.xml</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: How can I get the information in the monitor page for my cluster monitoring system?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Use GetMasterStats:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.test.GetMasterStats | grep Load
+ OS Load Average: 0.27</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: The monitor page is showing an offline tablet.  How can I find out which tablet it is?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Use FindOfflineTablets:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.server.util.FindOfflineTablets
+2&lt;&lt;@(null,null,localhost:9997) is UNASSIGNED  #walogs:2</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Here&#8217;s what the output means:</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><code>2&lt;&lt;</code></dt>
+<dd>
+<p>This is the tablet from (-inf, +inf) for the
+table with id 2.  The command <code>tables -l</code> in the shell will show table ids for
+tables.</p>
+</dd>
+<dt class="hdlist1"><code>@(null, null, localhost:9997)</code></dt>
+<dd>
+<p>Location information.  The
+format is <code>@(assigned, hosted, last)</code>.  In this case, the
+tablet has not been assigned, is not hosted anywhere, and was once
+hosted on localhost.</p>
+</dd>
+<dt class="hdlist1"><code>#walogs:2</code></dt>
+<dd>
+<p>The number of write-ahead logs that this tablet requires for recovery.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p>An unassigned tablet with write-ahead logs is probably waiting for
+logs to be sorted for efficient recovery.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: How can I be sure that the metadata tables are up and consistent?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: <code>CheckForMetadataProblems</code> will verify the start/end of
+every tablet matches, and the start and stop for the table is empty:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.server.util.CheckForMetadataProblems -u root --password
+Enter the connection password:
+All is well for table !0
+All is well for table 1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My hadoop cluster has lost a file due to a NameNode failure.  How can I remove the file?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: There&#8217;s a utility that will check every file reference and ensure
+that the file exists in HDFS.  Optionally, it will remove the
+reference:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.server.util.RemoveEntriesForMissingFiles -u root --password
+Enter the connection password:
+2013-07-16 13:10:57,293 [util.RemoveEntriesForMissingFiles] INFO : File /accumulo/tables/2/default_tablet/F0000005.rf
+ is missing
+2013-07-16 13:10:57,296 [util.RemoveEntriesForMissingFiles] INFO : 1 files of 3 missing</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: I have many entries in zookeeper for old instances I no longer need.  How can I remove them?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Use CleanZookeeper:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.server.util.CleanZookeeper</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This command will not delete the instance pointed to by the local <code>conf/accumulo-site.xml</code> file.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: I need to decommission a node.  How do I stop the tablet server on it?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Use the admin command:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo admin stop hostname:9997
+2013-07-16 13:15:38,403 [util.Admin] INFO : Stopping server 12.34.56.78:9997</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: I cannot login to a tablet server host, and the tablet server will not shut down.  How can I kill the server?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Sometimes you can kill a "stuck" tablet server by deleting its lock in zookeeper:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks --list
+                  127.0.0.1:9997 TSERV_CLIENT=127.0.0.1:9997
+$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -delete 127.0.0.1:9997
+$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -list
+                  127.0.0.1:9997             null</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>You can find the master and instance id for any accumulo instances using the same zookeeper instance:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./bin/accumulo org.apache.accumulo.server.util.ListInstances
+INFO : Using ZooKeepers localhost:2181
+
+ Instance Name       | Instance ID                          | Master
+---------------------+--------------------------------------+-------------------------------
+              "test" | 6140b72e-edd8-4126-b2f5-e74a8bbe323b |                127.0.0.1:9999</pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="metadata">20.6. System Metadata Tables</h3>
+<div class="paragraph">
+<p>Accumulo tracks information about tables in metadata tables. The metadata for
+most tables is contained within the metadata table in the accumulo namespace,
+while metadata for that table is contained in the root table in the accumulo
+namespace. The root table is composed of a single tablet, which does not
+split, so it is also called the root tablet. Information about the root
+table, such as its location and write-ahead logs, are stored in ZooKeeper.</p>
+</div>
+<div class="paragraph">
+<p>Let&#8217;s create a table and put some data into it:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>shell&gt; createtable test
+
+shell&gt; tables -l
+accumulo.metadata    =&gt;        !0
+accumulo.root        =&gt;        +r
+test                 =&gt;         2
+trace                =&gt;         1
+
+shell&gt; insert a b c d
+
+shell&gt; flush -w</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Now let&#8217;s take a look at the metadata for this table:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>shell&gt; table accumulo.metadata
+shell&gt; scan -b 3; -e 3&lt;
+3&lt; file:/default_tablet/F000009y.rf []    186,1
+3&lt; last:13fe86cd27101e5 []    127.0.0.1:9997
+3&lt; loc:13fe86cd27101e5 []    127.0.0.1:9997
+3&lt; log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 []    127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6
+3&lt; srv:dir []    /default_tablet
+3&lt; srv:flush []    1
+3&lt; srv:lock []    tservers/127.0.0.1:9997/zlock-0000000001$13fe86cd27101e5
+3&lt; srv:time []    M1373998392323
+3&lt; ~tab:~pr []    \x00</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Let&#8217;s decode this little session:</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><code>scan -b 3; -e 3&lt;</code></dt>
+<dd>
+<p>Every tablet gets its own row. Every row starts with the table id followed by
+<code>;</code> or <code>&lt;</code>, and followed by the end row split point for that tablet.</p>
+</dd>
+<dt class="hdlist1"><code>file:/default_tablet/F000009y.rf [] 186,1</code></dt>
+<dd>
+<p>File entry for this tablet.  This tablet contains a single file reference. The
+file is <code>/accumulo/tables/3/default_tablet/F000009y.rf</code>.  It contains 1
+key/value pair, and is 186 bytes long.</p>
+</dd>
+<dt class="hdlist1"><code>last:13fe86cd27101e5 []    127.0.0.1:9997</code></dt>
+<dd>
+<p>Last location for this tablet.  It was last held on 127.0.0.1:9997, and the
+unique tablet server lock data was <code>13fe86cd27101e5</code>. The default balancer
+will tend to put tablets back on their last location.</p>
+</dd>
+<dt class="hdlist1"><code>loc:13fe86cd27101e5 []    127.0.0.1:9997</code></dt>
+<dd>
+<p>The current location of this tablet.</p>
+</dd>
+<dt class="hdlist1"><code>log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 []    127.0. &#8230;&#8203;</code></dt>
+<dd>
+<p>This tablet has a reference to a single write-ahead log. This file can be found in
+<code>/accumulo/wal/127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995</code>. The value
+of this entry could refer to multiple files. This tablet&#8217;s data is encoded as
+<code>6</code> within the log.</p>
+</dd>
+<dt class="hdlist1"><code>srv:dir []    /default_tablet</code></dt>
+<dd>
+<p>Files written for this tablet will be placed into
+<code>/accumulo/tables/3/default_tablet</code>.</p>
+</dd>
+<dt class="hdlist1"><code>srv:flush []    1</code></dt>
+<dd>
+<p>Flush id.  This table has successfully completed the flush with the id of <code>1</code>.</p>
+</dd>
+<dt class="hdlist1"><code>srv:lock []    tservers/127.0.0.1:9997/zlock-0000000001\$13fe86cd27101e5</code></dt>
+<dd>
+<p>This is the lock information for the tablet holding the present lock.  This
+information is checked against zookeeper whenever this is updated, which
+prevents a metadata update from a tablet server that no longer holds its
+lock.</p>
+</dd>
+<dt class="hdlist1"><code>srv:time []    M1373998392323</code></dt>
+<dd>
+<p>This indicates the time time type (<code>M</code> for milliseconds or <code>L</code> for logical) and the timestamp of the most recently written key in this tablet.  It is used to ensure automatically assigned key timestamps are strictly increasing for the tablet, regardless of the tablet server&#8217;s system time.</p>
+</dd>
+<dt class="hdlist1"><code>~tab:~pr []    \x00</code></dt>
+<dd>
+<p>The end-row marker for the previous tablet (prev-row).  The first byte
+indicates the presence of a prev-row.  This tablet has the range (-inf, +inf),
+so it has no prev-row (or end row).</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p>Besides these columns, you may see:</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><code>rowId future:zooKeeperID location</code></dt>
+<dd>
+<p>Tablet has been assigned to a tablet, but not yet loaded.</p>
+</dd>
+<dt class="hdlist1"><code>~del:filename</code></dt>
+<dd>
+<p>When a tablet server is done use a file, it will create a delete marker in the appropriate metadata table, unassociated with any tablet.  The garbage collector will remove the marker, and the file, when no other reference to the file exists.</p>
+</dd>
+<dt class="hdlist1"><code>~blip:txid</code></dt>
+<dd>
+<p>Bulk-Load In Progress marker.</p>
+</dd>
+<dt class="hdlist1"><code>rowId loaded:filename</code></dt>
+<dd>
+<p>A file has been bulk-loaded into this tablet, however the bulk load has not yet completed on other tablets, so this marker prevents the file from being loaded multiple times.</p>
+</dd>
+<dt class="hdlist1"><code>rowId !cloned</code></dt>
+<dd>
+<p>A marker that indicates that this tablet has been successfully cloned.</p>
+</dd>
+<dt class="hdlist1"><code>rowId splitRatio:ratio</code></dt>
+<dd>
+<p>A marker that indicates a split is in progress, and the files are being split at the given ratio.</p>
+</dd>
+<dt class="hdlist1"><code>rowId chopped</code></dt>
+<dd>
+<p>A marker that indicates that the files in the tablet do not contain keys outside the range of the tablet.</p>
+</dd>
+<dt class="hdlist1"><code>rowId scan</code></dt>
+<dd>
+<p>A marker that prevents a file from being removed while there are still active scans using it.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_simple_system_recovery">20.7. Simple System Recovery</h3>
+<div class="paragraph">
+<p><strong>Q</strong>: One of my Accumulo processes died. How do I bring it back?</p>
+</div>
+<div class="paragraph">
+<p>The easiest way to bring all services online for an Accumulo instance is to run the <code>start-all.sh</code> script.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ bin/start-all.sh</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This process will check the process listing, using <code>jps</code> on each host before attempting to restart a service on the given host.
+Typically, this check is sufficient except in the face of a hung/zombie process. For large clusters, it may be
+undesirable to ssh to every node in the cluster to ensure that all hosts are running the appropriate processes and <code>start-here.sh</code> may be of use.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ ssh host_with_dead_process
+$ bin/start-here.sh</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><code>start-here.sh</code> should be invoked on the host which is missing a given process. Like start-all.sh, it will start all
+necessary processes that are not currently running, but only on the current host and not cluster-wide. Tools such as <code>pssh</code> or
+<code>pdsh</code> can be used to automate this process.</p>
+</div>
+<div class="paragraph">
+<p><code>start-server.sh</code> can also be used to start a process on a given host; however, it is not generally recommended for
+users to issue this directly as the <code>start-all.sh</code> and <code>start-here.sh</code> scripts provide the same functionality with
+more automation and are less prone to user error.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Use <code>start-all.sh</code> or <code>start-here.sh</code>.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: My process died again. Should I restart it via <code>cron</code> or tools like <code>supervisord</code>?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: A repeatedly dying Accumulo process is a sign of a larger problem. Typically these problems are due to a
+misconfiguration of Accumulo or over-saturation of resources. Blind automation of any service restart inside of Accumulo
+is generally an undesirable situation as it is indicative of a problem that is being masked and ignored. Accumulo
+processes should be stable on the order of months and not require frequent restart.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_advanced_system_recovery">20.8. Advanced System Recovery</h3>
+<div class="sect3">
+<h4 id="_hdfs_failure">20.8.1. HDFS Failure</h4>
+<div class="paragraph">
+<p><strong>Q</strong>: I had disasterous HDFS failure.  After bringing everything back up, several tablets refuse to go online.</p>
+</div>
+<div class="paragraph">
+<p>Data written to tablets is written into memory before being written into indexed files.  In case the server
+is lost before the data is saved into a an indexed file, all data stored in memory is first written into a
+write-ahead log (WAL).  When a tablet is re-assigned to a new tablet server, the write-ahead logs are read to
+recover any mutations that were in memory when the tablet was last hosted.</p>
+</div>
+<div class="paragraph">
+<p>If a write-ahead log cannot be read, then the tablet is not re-assigned.  All it takes is for one of
+the blocks in the write-ahead log to be missing.  This is unlikely unless multiple data nodes in HDFS have been
+lost.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Get the WAL files online and healthy.  Restore any data nodes that may be down.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: How do find out which tablets are offline?</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: Use <code>accumulo admin checkTablets</code></p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ bin/accumulo admin checkTablets</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: I lost three data nodes, and I&#8217;m missing blocks in a WAL.  I don&#8217;t care about data loss, how
+can I get those tablets online?</p>
+</div>
+<div class="paragraph">
+<p>See the discussion in <a href="#metadata">System Metadata Tables</a>, which shows a typical metadata table listing.
+The entries with a column family of <code>log</code> are references to the WAL for that tablet.
+If you know what WAL is bad, you can find all the references with a grep in the shell:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>shell&gt; grep 0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995
+3&lt; log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 []    127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: You can remove the WAL references in the metadata table.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>shell&gt; grant -u root Table.WRITE -t accumulo.metadata
+shell&gt; delete 3&lt; log 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Note: the colon (<code>:</code>) is omitted when specifying the <em>row cf cq</em> for the delete command.</p>
+</div>
+<div class="paragraph">
+<p>The master will automatically discover the tablet no longer has a bad WAL reference and will
+assign the tablet.  You will need to remove the reference from all the tablets to get them
+online.</p>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: The metadata (or root) table has references to a corrupt WAL.</p>
+</div>
+<div class="paragraph">
+<p>This is a much more serious state, since losing updates to the metadata table will result
+in references to old files which may not exist, or lost references to new files, resulting
+in tablets that cannot be read, or large amounts of data loss.</p>
+</div>
+<div class="paragraph">
+<p>The best hope is to restore the WAL by fixing HDFS data nodes and bringing the data back online.
+If this is not possible, the best approach is to re-create the instance and bulk import all files from
+the old instance into a new tables.</p>
+</div>
+<div class="paragraph">
+<p>A complete set of instructions for doing this is outside the scope of this guide,
+but the basic approach is:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Use <code>tables -l</code> in the shell to discover the table name to table id mapping</p>
+</li>
+<li>
+<p>Stop all accumulo processes on all nodes</p>
+</li>
+<li>
+<p>Move the accumulo directory in HDFS out of the way:
+$ hadoop fs -mv /accumulo /corrupt</p>
+</li>
+<li>
+<p>Re-initalize accumulo</p>
+</li>
+<li>
+<p>Recreate tables, users and permissions</p>
+</li>
+<li>
+<p>Import the directories under <code>/corrupt/tables/&lt;id&gt;</code> into the new instance</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>Q</strong>: One or more HDFS Files under /accumulo/tables are corrupt</p>
+</div>
+<div class="paragraph">
+<p>Accumulo maintains multiple references into the tablet files in the metadata
+tables and within the tablet server hosting the file, this makes it difficult to
+reliably just remove those references.</p>
+</div>
+<div class="paragraph">
+<p>The directory structure in HDFS for tables will follow the general structure:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>/accumulo
+/accumulo/tables/
+/accumulo/tables/!0
+/accumulo/tables/!0/default_tablet/A000001.rf
+/accumulo/tables/!0/t-00001/A000002.rf
+/accumulo/tables/1
+/accumulo/tables/1/default_tablet/A000003.rf
+/accumulo/tables/1/t-00001/A000004.rf
+/accumulo/tables/1/t-00001/A000005.rf
+/accumulo/tables/2/default_tablet/A000006.rf
+/accumulo/tables/2/t-00001/A000007.rf</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If files under <code>/accumulo/tables</code> are corrupt, the best course of action is to
+recover those files in hdsf see the section on HDFS. Once these recovery efforts
+have been exhausted, the next step depends on where the missing file(s) are
+located. Different actions are required when the bad files are in Accumulo data
+table files or if they are metadata table files.</p>
+</div>
+<div class="paragraph">
+<p><strong>Data File Corruption</strong></p>
+</div>
+<div class="paragraph">
+<p>When an Accumulo data file is corrupt, the most reliable way to restore Accumulo
+operations is to replace the missing file with an &#8220;empty&#8221; file so that
+references to the file in the METADATA table and within the tablet server
+hosting the file can be resolved by Accumulo. An empty file can be created using
+the CreateEmpty utiity:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ accumulo org.apache.accumulo.core.file.rfile.CreateEmpty /path/to/empty/file/empty.rf</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The process is to delete the corrupt file and then move the empty file into its
+place (The generated empty file can be copied and used multiple times if necessary and does not need
+to be regenerated each time)</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ hadoop fs –rm /accumulo/tables/corrupt/file/thename.rf; \
+hadoop fs -mv /path/to/empty/file/empty.rf /accumulo/tables/corrupt/file/thename.rf</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><strong>Metadata File Corruption</strong></p>
+</div>
+<div class="paragraph">
+<p>If the corrupt files are metadata files, see <a href="#metadata">System Metadata Tables</a> (under the path
+<code>/accumulo/tables/!0</code>) then you will need to rebuild
+the metadata table by initializing a new instance of Accumulo and then importing
+all of the existing data into the new instance.  This is the same procedure as
+recovering from a zookeeper failure (see <a href="#zookeeper_failure">ZooKeeper Failure</a>), except that
+you will have the benefit of having the existing user and table authorizations
+that are maintained in zookeeper.</p>
+</div>
+<div class="paragraph">
+<p>You can use the DumpZookeeper utility to save this information for reference
+before creating the new instance.  You will not be able to use RestoreZookeeper
+because the table names and references are likely to be different between the
+original and the new instances, but it can serve as a reference.</p>
+</div>
+<div class="paragraph">
+<p><strong>A</strong>: If the files cannot be recovered, replace corrupt data files with a empty
+rfiles to allow references in the metadata table and in the tablet servers to be
+resolved. Rebuild the metadata table if the corrupt files are metadata files.</p>
+</div>
+<div class="paragraph">
+<p><strong>Write-Ahead Log(WAL) File Corruption</strong></p>
+</div>
+<div class="paragraph">
+<p>In certain versions of Accumulo, a corrupt WAL file (caused by HDFS corruption
+or a bug in Accumulo that created the file) can block the successful recovery
+of one to many Tablets. Accumulo can be stuck in a loop trying to recover the
+WAL file, never being able to succeed.</p>
+</div>
+<div class="paragraph">
+<p>In the cases where the WAL file&#8217;s original contents are unrecoverable or some degree
+of data loss is acceptable (beware if the WAL file contains updates to the Accumulo
+metadat table!), the following process can be followed to create an valid, empty
+WAL file. Run the following commands as the Accumulo unix user (to ensure that
+the proper file permissions in HDFS)</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ echo -n -e '--- Log File Header (v2) ---\x00\x00\x00\x00' &gt; empty.wal</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The above creates a file with the text "--- Log File Header (v2) ---" and then
+four bytes. You should verify the contents of the file with a hexdump tool.</p>
+</div>
+<div class="paragraph">
+<p>Then, place this empty WAL in HDFS and then replace the corrupt WAL file in HDFS
+with the empty WAL.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>$ hdfs dfs -moveFromLocal empty.wal /user/accumulo/empty.wal
+$ hdfs dfs -mv /user/accumulo/empty.wal /accumulo/wal/tserver-4.example.com+10011/26abec5b-63e7-40dd-9fa1-b8ad2436606e</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>After the corrupt WAL file has been replaced, the system should automatically recover.
+It may be necessary to restart the Accumulo Master process as an exponential
+backup policy is used which could lead to a long wait before Accumulo will
+try to re-load the WAL file.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="zookeeper_failure">20.8.2. ZooKeeper Failure</h4>
+<div class="paragraph">
+<p><strong>Q</strong>: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance?</p>
+</div>
+<div class="paragraph">
+<p>ZooKeeper, in addition to its lock-service capabilities, also serves to bootstrap an Accumulo
+instance from some location in HDFS. It contains the pointers to the root tablet in HDFS which
+is then used to load the Accumulo metadata tablets, which then loads all user tables. ZooKeeper
+also stores all namespace and table configuration, the user database, the mapping of table IDs to
+table names, and more across Accumulo restarts.</p>
+</div>
+<div class="paragraph">
+<p>Presently, the only way to recover such an instance is to initialize a new instance and import all
+of the old data into the new instance. The easiest way to tackle this problem is to first recreate
+the mapping of table ID to table name and then recreate each of those tables in the new instance.
+Set any necessary configuration on the new tables and add some split points to the tables to close
+the gap between how many splits the old table had and no splits.</p>
+</div>
+<div class="paragraph">
+<p>The directory structure in HDFS for tables will follow the general structure:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>/accumulo
+/accumulo/tables/
+/accumulo/tables/1
+/accumulo/tables/1/default_tablet/A000001.rf
+/accumulo/tables/1/t-00001/A000002.rf
+/accumulo/tables/1/t-00001/A000003.rf
+/accumulo/tables/2/default_tablet/A000004.rf
+/accumulo/tables/2/t-00001/A000005.rf</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>For each table, make a new directory that you can move (or copy if you have the HDFS space to do so)
+all of the rfiles for a given table into. For example, to process the table with an ID of <code>1</code>, make a new directory,
+say <code>/new-table-1</code> and then copy all files from <code>/accumulo/tables/1/*/*.rf</code> into that directory. Additionally,
+make a directory, <code>/new-table-1-failures</code>, for any failures during the import process. Then, issue the import
+command using the Accumulo shell into the new table, telling Accumulo to not re-set the timestamp:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>user@instance new_table&gt; importdirectory /new-table-1 /new-table-1-failures false</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Any RFiles which were failed to be loaded will be placed in <code>/new-table-1-failures</code>. Rfiles that were successfully
+imported will no longer exist in <code>/new-table-1</code>. For failures, move them back to the import directory and retry
+the <code>importdirectory</code> command.</p>
+</div>
+<div class="paragraph">
+<p>It is <strong>extremely</strong> important to note that this approach may introduce stale data back into
+the tables. For a few reasons, RFiles may exist in the table directory which are candidates for deletion but have
+not yet been deleted. Additionally, deleted data which was not compacted away, but still exists in write-ahead logs if
+the original instance was somehow recoverable, will be re-introduced in the new instance. Table splits and merges
+(which also include the deleteRows API call on TableOperations, are also vulnerable to this problem. This process should
+<strong>not</strong> be used if these are unacceptable risks. It is possible to try to re-create a view of the <code>accumulo.metadata</code>
+table to prune out files that are candidates for deletion, but this is a difficult task that also may not be entirely accurate.</p>
... 171217 lines suppressed ...

-- 
To stop receiving notification emails like this one, please contact
ctubbsii@apache.org.

Mime
View raw message