Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 38D2017E64 for ; Wed, 20 May 2015 00:33:43 +0000 (UTC) Received: (qmail 69469 invoked by uid 500); 20 May 2015 00:33:43 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 69441 invoked by uid 500); 20 May 2015 00:33:43 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 69432 invoked by uid 99); 20 May 2015 00:33:43 -0000 Received: from eris.apache.org (HELO hades.apache.org) (140.211.11.105) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 May 2015 00:33:43 +0000 Received: from hades.apache.org (localhost [127.0.0.1]) by hades.apache.org (ASF Mail Server at hades.apache.org) with ESMTP id DD960AC012B for ; Wed, 20 May 2015 00:33:42 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1680432 - /accumulo/site/trunk/content/release_notes/1.7.0.mdtext Date: Wed, 20 May 2015 00:33:42 -0000 To: commits@accumulo.apache.org From: ctubbsii@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20150520003342.DD960AC012B@hades.apache.org> Author: ctubbsii Date: Wed May 20 00:33:42 2015 New Revision: 1680432 URL: http://svn.apache.org/r1680432 Log: Numerous minor edits to release notes. * Fix many numerous small grammar, punctuation, and spelling errors. * Standardize the section headers. * Reorder sections to improve flow. ("Other" is now near the end, after more specific sections.) * Put requirements changes near top, because it will affect everybody, regardless of interest in other changes. * Use https for links to sites which support it. * Reformat file (paragraph width) to make it easier to merge my conflicting edits with elserj's edits. Modified: accumulo/site/trunk/content/release_notes/1.7.0.mdtext Modified: accumulo/site/trunk/content/release_notes/1.7.0.mdtext URL: http://svn.apache.org/viewvc/accumulo/site/trunk/content/release_notes/1.7.0.mdtext?rev=1680432&r1=1680431&r2=1680432&view=diff ============================================================================== --- accumulo/site/trunk/content/release_notes/1.7.0.mdtext (original) +++ accumulo/site/trunk/content/release_notes/1.7.0.mdtext Wed May 20 00:33:42 2015 @@ -16,326 +16,362 @@ Notice: Licensed to the Apache Softwa specific language governing permissions and limitations under the License. -Apache Accumulo 1.7.0 is a major release which includes a number of important milestone features -that expand on the core functionality of Accumulo. These features range from security to availability -to extendability. Nearly 700 JIRA issues were resolved with the release of this version: approximately -two-thirds of which were bugs and one third were improvements. - -In the context of Accumulo's [Semantic Versioning][semver] [guidelines][api], this is a "minor version" -which means that new APIs have been created, some deprecations may have been added, but no deprecated APIs -have been removed. Code written against -1.6.x should work against 1.7.0, likely binary-compatible but definitely source-compatible. As always, the Accumulo -developers take API compatibility very seriously and have invested significant time and effort in ensuring that -we meet the promises set forward to our users. - -## Major Changes - -### Client Authentication with Kerberos - -Kerberos is far and away the de-facto means to provide strong authentication across Hadoop -and other related components. Kerberos requires a centralized key distribution center -to authentication users who have credentials provided by an administrator. When Hadoop is -configured for use with Kerberos, all users must provide Kerberos credentials to interact -with the filesystem, launch YARN jobs, or even view certain web pages. - -While Accumulo has long supported operation on Kerberos-enabled HDFS, it still required -Accumulo users to use password-based authentication. [ACCUMULO-2815][ACCUMULO-2815] -added support that allows Accumulo clients to use their existing Kerberos -credentials to interact with Accumulo and all other Hadoop components instead of -a separate username and password for Accumulo. - -This authentication leverages the [Simple Authentication and Security Layer (SASL)][SASL] -and [GSSAPI][GSSAPI] interface to support Kerberos authentication over the existing Thrift-based -RPC infrastructure that Accumulo uses. - -These additions represent a significant forward step for Accumulo, bringing its client-authentication -up to speed with the rest of the Hadoop ecosystem. This results in a much more cohesive -authetication story for Accumulo that resonates with the battle-tested cell-level security -and authorization components Accumulo users are very familiar with already. - -More information on configuration, administration and application of Kerberos client -authentication can be found in the [Kerberos chapter][kerberos] of the Accumulo -User Manual. - - -### Data-Center Replication - -In previous releases, Accumulo only operated within the constraints of a single installation. -Because single instances of Accumulo often consist of many nodes and Accumulo's design scales -(near) linearly across many nodes, it is typical that one Accumulo is run per physical installation -or data-center. [ACCUMULO-378][ACCUMULO-378] introduces support in Accumulo to automatically -copy data from one Accumulo instance to another. - -This data-center replication feature is primarily applicable to users wishing to implement -a disaster recovery strategy. Data can be automatically copied from a primary instance to one -or more secondary Accumulo instances. Where normal Accumulo ingest and -queries are strongly consistent, data-center replication is a lazy, eventually consistent operation. This -is desirable for replication as it prevents additional latency for ingest operations on the -primary instance. Additionally, network outages between the primary instance and replicas can sustain -prolonged outages without any administrative overhead. - -The Accumulo User Manual contains a [new chapter on replication][replication] which covers -in great detail the design and implementation of the feature, how users can configure replication -and special cases to consider when choosing to integrate the feature into user applications. - - -### User Initiated Compaction Strategies - -Per table compaction strategies were added in 1.6.0 to provide custom logic in choosing which -files are chosen for a major compaction. In 1.7.0, the ability to -specify a compaction strategy for a user-initiated compaction was added in -[ACCUMULO-1798][ACCUMULO-1798]. This allows surgical compactions on a subset -of tablets files. Previously a user initiated compaction would compact all -files in a tablet. - -In the Java API, this new feature can be accessed in the following way : - - Connection conn = ... - CompactionStrategyConfig csConfig = new CompactionStrategyConfig(strategyClassName).setOptions(strategyOpts); - CompactionConfig compactionConfig = new CompactionConfig().setCompactionStrategy(csConfig); - connector.tableOperations().compact(tableName, compactionConfig) - -In [ACCUMULO-3134][ACCUMULO-3134] the shell's compact command was modified to -enable selecting which files to compact based on size, name, and path. Options -were also added to the shell's compaction command to allow setting RFile options -for the compaction output. Setting the output options could be useful for -testing. For example, one tablet to be compacted using snappy. - -The following is an example shell command that compacts all files less than -10MB, if the tablet has at least two files that meet this criteria. If a -tablet had a 100MB, 50MB, 7MB, and 5MB file then the 7MB and 5MB files would be -compacted. If a tablet had a 100MB and 5MB file, then nothing would be done -because there are not at least two files meeting the selection criteria. - - - compact -t foo --min-files 2 --sf-lt-esize 10M +Apache Accumulo 1.7.0 is a significant release which includes many important +milestone features to expand the functionality of Accumulo. These include +features related to security, availability, and extensibility. Nearly 700 JIRA +issues were resolved in this version. Approximately two-thirds were bugs and +one-third were improvements. + +In the context of Accumulo's [Semantic Versioning][semver] [guidelines][api], +this is a "minor version". This means that new APIs have been created, but no +deprecated APIs have been removed. Code written against 1.6.x should work +against 1.7.0 (though it may require re-compilation). As always, the Accumulo +developers take API compatibility very seriously and have invested much time +to ensure that we meet the promises set forth to our users. +# Major Changes # -The following is an example shell command that compacts all bulk imported files -in a table. +## Updated Minimum Requirements ## - - compact -t foo --sf-ename I.* - -These options in the shell to select files use a custom compaction strategy. Options -were also added to the shell to specify an arbitrary compaction strategy. The option to -specify an arbitraty compaction strategy is mutually exclusive with the file selection -options and file creation options. - -### API Clarification - -The declared API in 1.6.x was incomplete. Some important classes like ColumnVisibility were not declared as Accumulo API. Significant -work was done under [ACCUMULO-3657][ACCUMULO-3657] to correct the API statement and clean up the API to be representative of -all classes which users are intended to interact with. The expanded and simplified API statement is in the [README][api]. - -In some places in the API, non-API types were used. Ideally, public API members would only use public API types. A tool called -[APILyzer][apilyzer] was created to find all API members that used non-API types. Many of the violations found by this tool were -deprecated to clearly communicate that a non API type was used. One example is a public API method that returned a class called -`KeyExtent`. `KeyExtent` was never intended to be in the public API because it contains code related to Accumulo internals. KeyExtent -and the API methods returning it have since been deprecated. These were replaced with a new class for identifying tablets that does not expose -the internals like `KeyExtent` did. Deprecating a type like this from the API makes the API more stable while also easier for contributors to change -Accumulo internals w/o impacting the API. - -The changes in [ACCUMULO-3657][ACCUMULO-3657] also included an Accumulo API regular expression for use with checkstyle. Starting -with 1.7.0, projects building on Accumulo can use this checkstyle rule to ensure they are only using Accumulo's public API. -The regular expression can be found in the [README][api]. - -### Updated Minimum Versions - -Apache Accumulo 1.7.0 comes with an updated set of minimum dependencies. +Apache Accumulo 1.7.0 comes with an updated set of minimum requirements. * Java7 is required. Java6 support is dropped. - * Hadoop 1 support is dropped, at least Hadoop 2.2.0 is required + * Hadoop 1 support is dropped. At least Hadoop 2.2.0 is required. * ZooKeeper 3.4.x or greater is required. -## Other improvements +## Client Authentication with Kerberos ## + +Kerberos is the de-facto means to provide strong authentication across Hadoop +and other related components. Kerberos requires a centralized key distribution +center to authentication users who have credentials provided by an +administrator. When Hadoop is configured for use with Kerberos, all users must +provide Kerberos credentials to interact with the filesystem, launch YARN +jobs, or even view certain web pages. + +While Accumulo has long supported operating on Kerberos-enabled HDFS, it still +required Accumulo users to use password-based authentication to authenticate +with Accumulo. [ACCUMULO-2815][ACCUMULO-2815] added support for allowing +Accumulo clients to use the same Kerberos credentials to authenticate to +Accumulo that they would use to authenticate to other Hadoop components, +instead of a separate user name and password just for Accumulo. + +This authentication leverages [Simple Authentication and Security Layer +(SASL)][SASL] and [GSSAPI][GSSAPI] to support Kerberos authentication over the +existing Thrift-based RPC infrastructure that Accumulo employs. + +These additions represent a significant forward step for Accumulo, bringing +its client-authentication up to speed with the rest of the Hadoop ecosystem. +This results in a much more cohesive authentication story for Accumulo that +resonates with the battle-tested cell-level security and authorization model +already familiar to Accumulo users. + +More information on configuration, administration, and application of Kerberos +client authentication can be found in the [Kerberos chapter][kerberos] of the +Accumulo User Manual. + +## Data-Center Replication ## + +In previous releases, Accumulo only operated within the constraints of a +single installation. Because single instances of Accumulo often consist of +many nodes and Accumulo's design scales (near) linearly across many nodes, it +is typical that one Accumulo is run per physical installation or data-center. +[ACCUMULO-378][ACCUMULO-378] introduces support in Accumulo to automatically +copy data from one Accumulo instance to another. -### Balancing Groups of Tablets +This data-center replication feature is primarily applicable to users wishing +to implement a disaster recovery strategy. Data can be automatically copied +from a primary instance to one or more other Accumulo instances. In contrast +to normal Accumulo operation, in which ingest and query are strongly +consistent, data-center replication is a lazy, eventually consistent +operation. This is desirable for replication, as it prevents additional +latency for ingest operations on the primary instance. Additionally, the +implementation of this feature can sustain prolonged outages between the +primary instance and replicas without any administrative overhead. + +The Accumulo User Manual contains a [new chapter on replication][replication] +which details the design and implementation of the feature, explains how users +can configure replication, and describes special cases to consider when +choosing to integrate the feature into a user application. + +## User-Initiated Compaction Strategies ## + +Per-table compaction strategies were added in 1.6.0 to provide custom logic to +decide which files are involved in a major compaction. In 1.7.0, the ability +to specify a compaction strategy for a user-initiated compaction was added in +[ACCUMULO-1798][ACCUMULO-1798]. This allows surgical compactions on a subset +of tablet files. Previously, a user-initiated compaction would compact all +files in a tablet. -By default, Accumulo evenly spreads each tables tablets across a cluster. In some -situations, it's advantageous for query or ingest to evenly spreads groups of tablets -within a table. For [ACCUMULO-3439][ACCUMULO-3439], a new balancer was added to evenly -spread groups of tablets for the purposes of optimizing performance. This -[blog post][group_balancer] provides more details about when and why users may desire -to leverage this feature.. - -### User-specified Durability - -Accumulo constantly tries to balance durability with performance. These are difficult problems -because guaranteeing durability of every write to Accumulo is very difficult in a massively-concurrent -environment that requires high throughput. One common area to focus this attention is the write-ahead log -as it must eventually call `fsync` on the local to guarantee that data written to is durable in the face -of unexpected power failures. In some cases where durability can be sacrificed, either due to the nature -of the data itself or redundant power supplies, ingest performance improvements can be attained. - -Prior to 1.7, a user could configure the level of durability for individual tables. With the implementation of -[ACCUMULO-1957][ACCUMULO-1957], durability is a first-class member on the `BatchWriter`. All `Mutations` written -using that `BatchWriter` will be written with the provided durability. This can result in substantially faster -ingest rates when the durability can be relaxed. - -### waitForBalance API - -When creating a new Accumulo table, the next step is typically adding splits to that -table before starting ingest. This can be extremely important as a table without -any splits will only be hosted on a single TabletServer and create a ingest bottleneck -until the table begins to naturally split. Adding many splits before ingesting will -ensure that a table is distributed across many servers and result in high throughput -when ingest first starts. - -Adding splits to a table has long been a synchronous operation, but the assignment -of those splits was asynchronous. A large number of splits could be processed, but -it was not guaranteed that they would be evenly distributed resulting in the same problem -as having an insufficient number of splits. [ACCUMULO-2998][ACCUMULO-2998] adds a new method -to `InstanceOperations` which allows users to wait for all tablets to be balanced. -This method lets users wait until tablets are appropriately distributed so that -ingest can be run at full-bore immediately. - -### Hadoop Metrics2 Support - -Accumulo has long had its own metrics system implemented using Java MBeans. This -enabled metrics to be reported by Accumulo services, but consumption by other systems -often required use of an additional tool like jmxtrans to read the metrics from the -MBeans and send them to some other system. +In the Java API, this new feature can be accessed in the following way: -[ACCUMULO-1817][ACCUMULO-1817] replaces this custom metrics system Accumulo -with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of which -is invalidating the need for an additional process to send metrics to common metrics -storage and visualization tools. With Metrics2 support, Accumulo can send its -metrics to common tools like Ganglia and Graphite. + Connection conn = ... + CompactionStrategyConfig csConfig = new CompactionStrategyConfig(strategyClassName).setOptions(strategyOpts); + CompactionConfig compactionConfig = new CompactionConfig().setCompactionStrategy(csConfig); + connector.tableOperations().compact(tableName, compactionConfig) + +In [ACCUMULO-3134][ACCUMULO-3134], the shell's `compact` command was modified +to enable selecting which files to compact based on size, name, and path. +Options were also added to the shell's compaction command to allow setting +RFile options for the compaction output. Setting the output options could be +useful for testing. For example, one tablet to be compacted using snappy +compression. -For more information on enabling Hadoop Metrics2, see the [Metrics Chapter][metrics] -in the Accumulo User Manual. +The following is an example shell command that compacts all files less than +10MB, if the tablet has at least two files that meet this criteria. If a +tablet had a 100MB, 50MB, 7MB, and 5MB file then the 7MB and 5MB files would +be compacted. If a tablet had a 100MB and 5MB file, then nothing would be done +because there are not at least two files meeting the selection criteria. -### Distributed Tracing with HTrace + compact -t foo --min-files 2 --sf-lt-esize 10M -HTrace has recently started gaining traction as a standlone-project, especially -with its adoption in HDFS. Accumulo has long had distributed tracing support -via its own "Cloudtrace" library, but this wasn't intended for use outside of Accumulo. +The following is an example shell command that compacts all bulk imported +files in a table. -[ACCUMULO-898][ACCUMULO-898] replaces Accumulo's Cloudtrace code with HTrace. This -has the benefit of adding timings (spans) from HDFS into Accumulo spans automatically. + compact -t foo --sf-ename I.* -Users who inspect traces via the Accumulo Monitor (or another system) will begin to -see timings from HDFS during operations like Major and Minor compactions when running -with at least Apache Hadoop 2.6.0. +These provided convenience options to select files execute using a specialized +compaction strategy. Options were also added to the shell to specify an +arbitrary compaction strategy. The option to specify an arbitrry compaction +strategy is mutually exclusive with the file selection and file creation +options, since those options are unique to the specialized compaction strategy +provided. See `compact --help` in the shell for the available options. + +## API Clarification ## + +The declared API in 1.6.x was incomplete. Some important classes like +ColumnVisibility were not declared as Accumulo API. Significant work was done +under [ACCUMULO-3657][ACCUMULO-3657] to correct the API statement and clean up +the API to be representative of all classes which users are intended to +interact with. The expanded and simplified API statement is in the +[README][api]. + +In some places in the API, non-API types were used. Ideally, public API +members would only use public API types. A tool called [APILyzer][apilyzer] +was created to find all API members that used non-API types. Many of the +violations found by this tool were deprecated to clearly communicate that a +non-API type was used. One example is a public API method that returned a +class called `KeyExtent`. `KeyExtent` was never intended to be in the public +API because it contains code related to Accumulo internals. `KeyExtent` and +the API methods returning it have since been deprecated. These were replaced +with a new class for identifying tablets that does not expose internals. +Deprecating a type like this from the API makes the API more stable while also +making it easier for contributors to change Accumulo internals without +impacting the API. + +The changes in [ACCUMULO-3657][ACCUMULO-3657] also included an Accumulo API +regular expression for use with checkstyle. Starting with 1.7.0, projects +building on Accumulo can use this checkstyle rule to ensure they are only +using Accumulo's public API. The regular expression can be found in the +[README][api]. -## Performance Improvements +# Performance Improvements # -### Configurable Threadpool Size for Assignments +## Configurable Threadpool Size for Assignments ## One of the primary tasks that the Accumulo Master is responsible for is the -assignment of Tablets to TabletServers. Before a Tablet can be brought online, -the tablet must not have any outstanding logs as this represents a need to perform -recovery (the tablet was not unloaded cleanly). This process can take some time for -large write-ahead log files and is performed on a TabletServer to keep the Master -light and agile. - -Assignment of Tablets, whether those Tablets need to perform recovery or not, share the same -threadpool in the Master. This means that when a large number of TabletServers are -available, too few threads dedicated to assignment can restrict the speed at which -assignments can be performed. [ACCUMULO-1085][ACCUMULO-1085] allows the size of the -threadpool used in the Master for assignments to be configurable which can be -dynamically altered to remove the limitation when sufficient servers are available. - -### Group-Commit Threshold as a Factor of Data Size - -When ingesting data into Accumulo, the majority of time is spent in the write-ahead -log. As such, this is a common place that optimizations are added. One optimization -is the notion of "group-commit". When multiple clients are writing data to the same -Accumulo Tablet, it is not efficient for each of them to synchronize the WAL, flush their -updates to disk for durability, and then release the lock. The idea of group-commit -is that multiple writers can queue their write their mutations to the WAL and -then wait for a sync that will satisfy the durability constraints of their batch of -updates. This has a drastic improvement on performance as many threads writing batches +assignment of tablets to tablet servers. Before a tablet can be brought +online, it must not have any outstanding logs because this represents a need +to perform recovery (the tablet was not unloaded cleanly). This process can +take some time for large write-ahead log files, so it is performed on a tablet +server to keep the Master light and agile. + +Assignments, whether the tablets need to perform recovery or not, share the +same threadpool in the Master. This means that when a large number of tablet +servers are available, too few threads dedicated to assignment can restrict +the speed at which assignments can be performed. +[ACCUMULO-1085][ACCUMULO-1085] allows the size of the threadpool used in the +Master for assignments to be configurable which can be dynamically altered to +remove the limitation when sufficient servers are available. + +## Group-Commit Threshold as a Factor of Data Size ## + +When ingesting data into Accumulo, the majority of time is spent in the +write-ahead log. As such, this is a common place that optimizations are added. +One optimization is the notion of "group-commit". When multiple clients are +writing data to the same Accumulo tablet, it is not efficient for each of them +to synchronize the WAL, flush their updates to disk for durability, and then +release the lock. The idea of group-commit is that multiple writers can queue +the write for their mutations to the WAL and then wait for a sync that will +satisfy the durability constraints of their batch of updates. This has a +drastic improvement on performance, since many threads writing batches concurrently can "share" the same `fsync`. -In previous versions, Accumulo controlled the frequency in which this group-commit -sync was performed as a factor of the number of clients writing to Accumulo. This was both confusing -to correctly configure and also encouraged sub-par performance with few write threads. -[ACCUMULO-1950][ACCUMULO-1950] introduced a new configuration property `tserver.total.mutation.queue.max` -which defines the amount of data that is queued before a group-commit is performed -in such a way that is agnostic of the number of writers. This new configuration property -is much easier to reason about than the previous (now deprecated) `tserver.mutation.queue.max`. -Users who have altered `tserver.mutation.queue.max` in the past are encouraged to start -using the new `tserver.total.mutation.queue.max` property. - -## Notable Bug Fixes - -### SourceSwitchingIterator Deadlock - -An instance of SourceSwitchingIterator, the Accumulo iterator which transparently -manages whether data for a Tablet read from memory (the in-memory map) or disk (HDFS -after a minor compaction), was found deadlocked in a production system. - -This deadlock prevented the scan and the minor compaction from ever successfully -completing without restarting the TabletServer. [ACCUMULO-3745][ACCUMULO-3745] -fixes the inconsistent synchronization inside of the SourceSwitchingIterator -to prevent this deadlock from happening in the future. - -The only mitigation of this bug is to restart the TabletServer that is deadlocked. - - -### Table flush blocked indefinitely - -While running the Accumulo Randomwalk distributed test, it was observed -that all activity in Accumulo had stopped and there was an offline -Accumulo Metadata table tablet. The system first tried to flush a user -tablet, but the metadata table was not online (likely due to the agitation -process which stops and starts Accumulo processes during the test). After -this call, a call to load the metadata tablet was queued but could not -complete until the previous flush call. Thus, a deadlock occurred. - -This deadlock happened because the synchronous flush call could not complete -before the load tablet call completed, but the load tablet call couldn't -run because of connection caching we perform in Accumulo's RPC layer -to reduce the quantity of sockets we need to create to send data. -[ACCUMULO-3597][ACCUMULO-3597] prevents this deadlock by forcing the use of a -non-cached connection for the RPCs requesting a load of a metadata tablet. While -this feature does result in additional network resources to be used, the concern is minimal -because the number of metadata tablets is typically very small with respect to the -total number of tablets in the system. - -The only mitigation of this bug is to restart the TabletServer that is hung. +In previous versions, Accumulo controlled the frequency in which this +group-commit sync was performed as a factor of the number of clients writing +to Accumulo. This was both confusing to correctly configure and also +encouraged sub-par performance with few write threads. +[ACCUMULO-1950][ACCUMULO-1950] introduced a new configuration property +`tserver.total.mutation.queue.max` which defines the amount of data that is +queued before a group-commit is performed in such a way that is agnostic of +the number of writers. This new configuration property is much easier to +reason about than the previous (now deprecated) `tserver.mutation.queue.max`. +Users who have altered `tserver.mutation.queue.max` in the past are encouraged +to start using the new `tserver.total.mutation.queue.max` property. + +# Other improvements # + +## Balancing Groups of Tablets ## + +By default, Accumulo evenly spreads each table's tablets across a cluster. In +some situations, it is advantageous for query or ingest to evenly spreads +groups of tablets within a table. For [ACCUMULO-3439][ACCUMULO-3439], a new +balancer was added to evenly spread groups of tablets to optimize performance. +This [blog post][group_balancer] provides more details about when and why +users may desire to leverage this feature.. + +## User-specified Durability ## + +Accumulo constantly tries to balance durability with performance. Guaranteeing +durability of every write to Accumulo is very difficult in a +massively-concurrent environment that requires high throughput. One common +area of focus is the write-ahead log, since it must eventually call `fsync` on +the local filesystem to guarantee that data written is durable in the face of +unexpected power failures. In some cases where durability can be sacrificed, +either due to the nature of the data itself or redundant power supplies, +ingest performance improvements can be attained. + +Prior to 1.7, a user could only configure the level of durability for +individual tables. With the implementation of [ACCUMULO-1957][ACCUMULO-1957], +the durability can be specified by the user when creating a `BatchWriter`, +giving users control over durability at the level of the individual writes. +Every `Mutation` written using that `BatchWriter` will be written with the +provided durability. This can result in substantially faster ingest rates when +the durability can be relaxed. + +## waitForBalance API ## + +When creating a new Accumulo table, the next step is typically adding splits +to that table before starting ingest. This can be extremely important since a +table without any splits will only be hosted on a single tablet server and +create a ingest bottleneck until the table begins to naturally split. Adding +many splits before ingesting will ensure that a table is distributed across +many servers and result in high throughput when ingest first starts. + +Adding splits to a table has long been a synchronous operation, but the +assignment of those splits was asynchronous. A large number of splits could be +processed, but it was not guaranteed that they would be evenly distributed +resulting in the same problem as having an insufficient number of splits. +[ACCUMULO-2998][ACCUMULO-2998] adds a new method to `InstanceOperations` which +allows users to wait for all tablets to be balanced. This method lets users +wait until tablets are appropriately distributed so that ingest can be run at +full-bore immediately. + +## Hadoop Metrics2 Support ## + +Accumulo has long had its own metrics system implemented using Java MBeans. +This enabled metrics to be reported by Accumulo services, but consumption by +other systems often required use of an additional tool like jmxtrans to read +the metrics from the MBeans and send them to some other system. -## Other changes +[ACCUMULO-1817][ACCUMULO-1817] replaces this custom metrics system Accumulo +with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of +which is invalidating the need for an additional process to send metrics to +common metrics storage and visualization tools. With Metrics2 support, +Accumulo can send its metrics to common tools like Ganglia and Graphite. + +For more information on enabling Hadoop Metrics2, see the [Metrics +Chapter][metrics] in the Accumulo User Manual. + +## Distributed Tracing with HTrace ## + +HTrace has recently started gaining traction as a standalone project, +especially with its adoption in HDFS. Accumulo has long had distributed +tracing support via its own "Cloudtrace" library, but this wasn't intended for +use outside of Accumulo. + +[ACCUMULO-898][ACCUMULO-898] replaces Accumulo's Cloudtrace code with HTrace. +This has the benefit of adding timings (spans) from HDFS into Accumulo spans +automatically. + +Users who inspect traces via the Accumulo Monitor (or another system) will begin +to see timings from HDFS during operations like Major and Minor compactions when +running with at least Apache Hadoop 2.6.0. -### VERSIONS file present in binary distribution +## VERSIONS file present in binary distribution ## In the pre-built binary distribution or distributions built by users from the -official source release, users will now see a `VERSIONS` file present in the lib -directory alongside the Accumulo server-side jars. Because the created tarball -strips off versions from the jar file names, it can require extra work to actually -find what the version of each dependent jar. +official source release, users will now see a `VERSIONS` file present in the +`lib/` directory alongside the Accumulo server-side jars. Because the created +tarball strips off versions from the jar file names, it can require extra work +to actually find what the version of each dependent jar (typically inspecting +the jar's manifest). [ACCUMULO-2863][ACCUMULO-2863] adds a `VERSIONS` file to the `lib/` directory which contains the Maven groupId, artifactId, and verison (GAV) information for each jar file included in the distribution. -### Per-Table Volume Chooser +## Per-Table Volume Chooser ## The `VolumeChooser` interface is a server-side extension point that allows user -tables to provide custom logic in choosing where its files are written when multiple -HDFS instances are available. By default, a randomized volume chooser implementation -is used to evenly balance files across all HDFS instances. +tables to provide custom logic in choosing where its files are written when +multiple HDFS instances are available. By default, a randomized volume chooser +implementation is used to evenly balance files across all HDFS instances. Previously, this VolumeChooser logic was instance-wide which meant that it would affect all tables. This is potentially undesirable as it might unintentionally -impact other users in a multi-tenant system. [ACCUMULO-3177][ACCUMULO-3177] introduces -a new per-table property which supports configuration of a `VolumeChooser`. This -ensures that the implementation to choose how HDFS utilization happens when multiple -are available is limited to the expected subset of all tables. - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk and Continuous Ingest tests run -on any number of nodes. *Agitation* refers to randomly restarting Accumulo processes and Hadoop DataNode processes, -and, in HDFS High-Availability instances, forcing NameNode failover. - -During testing, multiple Accumulo developers noticed some stability issues with HDFS using Apache Hadoop 2.6.0 -when restarting Accumulo processes and HDFS datanodes. The developers investigated these issues as a part -of the normal release testing procedures, but were unable to find a definitive cause of these failures. Users -are encouraged to follow [ACCUMULO-2388][ACCUMULO-2388] if they wish to follow any future developments. One -possible workaround is to increase the `general.rpc.timeout` in the Accumulo configuration from `120s` to `240s`. +impact other users in a multi-tenant system. [ACCUMULO-3177][ACCUMULO-3177] +introduces a new per-table property which supports configuration of a +`VolumeChooser`. This ensures that the implementation to choose how HDFS +utilization happens when multiple are available is limited to the expected +subset of all tables. + +# Notable Bug Fixes # + +## SourceSwitchingIterator Deadlock ## + +An instance of SourceSwitchingIterator, the Accumulo iterator which +transparently manages whether data for a tablet read from memory (the +in-memory map) or disk (HDFS after a minor compaction), was found deadlocked +in a production system. + +This deadlock prevented the scan and the minor compaction from ever +successfully completing without restarting the tablet server. +[ACCUMULO-3745][ACCUMULO-3745] fixes the inconsistent synchronization inside +of the SourceSwitchingIterator to prevent this deadlock from happening in the +future. + +The only mitigation of this bug was to restart the tablet server that is +deadlocked. + +## Table flush blocked indefinitely ## + +While running the Accumulo RandomWalk distributed test, it was observed that +all activity in Accumulo had stopped and there was an offline Accumulo +metadata table tablet. The system first tried to flush a user tablet, but the +metadata table was not online (likely due to the agitation process which stops +and starts Accumulo processes during the test). After this call, a call to +load the metadata tablet was queued but could not complete until the previous +flush call. Thus, a deadlock occurred. + +This deadlock happened because the synchronous flush call could not complete +before the load tablet call completed, but the load tablet call couldn't run +because of connection caching we perform in Accumulo's RPC layer to reduce the +quantity of sockets we need to create to send data. +[ACCUMULO-3597][ACCUMULO-3597] prevents this deadlock by forcing the use of a +non-cached connection for the RPC message requesting a metadata tablet to be +loaded. + +While this feature does result in additional network resources to be used, the +concern is minimal because the number of metadata tablets is typically very +small with respect to the total number of tablets in the system. + +The only mitigation of this bug was to restart the tablet server that is hung. + +# Testing # + +Each unit and functional test only runs on a single node, while the RandomWalk +and Continuous Ingest tests run on any number of nodes. *Agitation* refers to +randomly restarting Accumulo processes and Hadoop DataNode processes, and, in +HDFS High-Availability instances, forcing NameNode fail-over. + +During testing, multiple Accumulo developers noticed some stability issues +with HDFS using Apache Hadoop 2.6.0 when restarting Accumulo processes and +HDFS datanodes. The developers investigated these issues as a part of the +normal release testing procedures, but were unable to find a definitive cause +of these failures. Users are encouraged to follow +[ACCUMULO-2388][ACCUMULO-2388] if they wish to follow any future developments. +One possible workaround is to increase the `general.rpc.timeout` in the +Accumulo configuration from `120s` to `240s`. @@ -380,7 +416,8 @@ possible workaround is to increase the `
- +[ACCUMULO-378]: https://issues.apache.org/jira/browse/ACCUMULO-378 +[ACCUMULO-898]: https://issues.apache.org/jira/browse/ACCUMULO-898 [ACCUMULO-1085]: https://issues.apache.org/jira/browse/ACCUMULO-1085 [ACCUMULO-1798]: https://issues.apache.org/jira/browse/ACCUMULO-1798 [ACCUMULO-1817]: https://issues.apache.org/jira/browse/ACCUMULO-1817 @@ -396,15 +433,13 @@ possible workaround is to increase the ` [ACCUMULO-3597]: https://issues.apache.org/jira/browse/ACCUMULO-3597 [ACCUMULO-3657]: https://issues.apache.org/jira/browse/ACCUMULO-3657 [ACCUMULO-3745]: https://issues.apache.org/jira/browse/ACCUMULO-3745 -[ACCUMULO-378]: https://issues.apache.org/jira/browse/ACCUMULO-378 -[ACCUMULO-898]: https://issues.apache.org/jira/browse/ACCUMULO-898 +[GSSAPI]: https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface +[SASL]: https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer +[api]: https://github.com/apache/accumulo/blob/1.7.0/README.md#api [apilyzer]: http://code.revelc.net/apilyzer-maven-plugin/ [group_balancer]: https://blogs.apache.org/accumulo/entry/balancing_groups_of_tablets -[GSSAPI]: http://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface -[kerberos]: http://accumulo.staging.apache.org/1.7/accumulo_user_manual.html#_kerberos +[kerberos]: /1.7/accumulo_user_manual.html#_kerberos +[metrics]: /1.7/accumulo_user_manual.html#_metrics [readme]: https://github.com/apache/accumulo/blob/1.7.0/README.md -[replication]: http://accumulo.staging.apache.org/1.7/accumulo_user_manual.html#_replication -[SASL]: http://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer +[replication]: /1.7/accumulo_user_manual.html#_replication [semver]: http://semver.org -[api]: https://github.com/apache/accumulo/blob/1.7.0/README.md#api -[metrics]: http://accumulo.staging.apache.org/1.7/accumulo_user_manual.html#_metrics \ No newline at end of file