accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1687660 - /accumulo/site/trunk/content/release_notes/1.5.3.mdtext
Date Thu, 25 Jun 2015 22:43:32 GMT
Author: elserj
Date: Thu Jun 25 22:43:32 2015
New Revision: 1687660

Stub out 1.5.3 release notes


Modified: accumulo/site/trunk/content/release_notes/1.5.3.mdtext
--- accumulo/site/trunk/content/release_notes/1.5.3.mdtext (original)
+++ accumulo/site/trunk/content/release_notes/1.5.3.mdtext Thu Jun 25 22:43:32 2015
@@ -16,347 +16,30 @@ Notice:    Licensed to the Apache Softwa
            specific language governing permissions and limitations
            under the License.
-Apache Accumulo 1.7.0 is a significant release that includes many important
-milestone features which expand the functionality of Accumulo. These include
-features related to security, availability, and extensibility. Nearly 700 JIRA
-issues were resolved in this version. Approximately two-thirds were bugs and
-one-third were improvements.
+Apache Accumulo 1.5.3 is a bug-fix release for the 1.5 series. It is likely the last
+release in this line due to lack of user interest and the magnitude of improvements
+in newer release lines.
 In the context of Accumulo's [Semantic Versioning][semver] [guidelines][api],
-this is a "minor version". This means that new APIs have been created, some
-deprecations may have been added, but no deprecated APIs have been removed.
-Code written against 1.6.x should work against 1.7.0, likely binary-compatible
-but definitely source-compatible. As always, the Accumulo developers take API compatibility
-very seriously and have invested much time to ensure that we meet the promises set forth
to our users.
+this is a "patch version". This means that there have been no API changes. Any
+changes which were made were done in a backwards-compatible manner. Code that
+runs against 1.5.2 is guaranteed to run against 1.5.3.
 # Major Changes #
-## Updated Minimum Requirements ##
-Apache Accumulo 1.7.0 comes with an updated set of minimum requirements.
-  * Java7 is required. Java6 support is dropped.
-  * Hadoop 2.2.0 or greater is required. Hadoop 1.x support is dropped.
-  * ZooKeeper 3.4.x or greater is required.
-## Client Authentication with Kerberos ##
-Kerberos is the de-facto means to provide strong authentication across Hadoop
-and other related components. Kerberos requires a centralized key distribution
-center to authentication users who have credentials provided by an
-administrator. When Hadoop is configured for use with Kerberos, all users must
-provide Kerberos credentials to interact with the filesystem, launch YARN
-jobs, or even view certain web pages.
-While Accumulo has long supported operating on Kerberos-enabled HDFS, it still
-required Accumulo users to use password-based authentication to authenticate
-with Accumulo. [ACCUMULO-2815][ACCUMULO-2815] added support for allowing
-Accumulo clients to use the same Kerberos credentials to authenticate to
-Accumulo that they would use to authenticate to other Hadoop components,
-instead of a separate user name and password just for Accumulo.
-This authentication leverages [Simple Authentication and Security Layer
-(SASL)][SASL] and [GSSAPI][GSSAPI] to support Kerberos authentication over the
-existing Thrift-based RPC infrastructure that Accumulo employs.
-These additions represent a significant forward step for Accumulo, bringing
-its client-authentication up to speed with the rest of the Hadoop ecosystem.
-This results in a much more cohesive authentication story for Accumulo that
-resonates with the battle-tested cell-level security and authorization model
-already familiar to Accumulo users.
-More information on configuration, administration, and application of Kerberos
-client authentication can be found in the [Kerberos chapter][kerberos] of the
-Accumulo User Manual.
-## Data-Center Replication ##
-In previous releases, Accumulo only operated within the constraints of a
-single installation. Because single instances of Accumulo often consist of
-many nodes and Accumulo's design scales (near) linearly across many nodes, it
-is typical that one Accumulo is run per physical installation or data-center.
-[ACCUMULO-378][ACCUMULO-378] introduces support in Accumulo to automatically
-copy data from one Accumulo instance to another.
-This data-center replication feature is primarily applicable to users wishing
-to implement a disaster recovery strategy. Data can be automatically copied
-from a primary instance to one or more other Accumulo instances. In contrast
-to normal Accumulo operation, in which ingest and query are strongly
-consistent, data-center replication is a lazy, eventually consistent
-operation. This is desirable for replication, as it prevents additional
-latency for ingest operations on the primary instance. Additionally, the
-implementation of this feature can sustain prolonged outages between the
-primary instance and replicas without any administrative overhead.
-The Accumulo User Manual contains a [new chapter on replication][replication]
-which details the design and implementation of the feature, explains how users
-can configure replication, and describes special cases to consider when
-choosing to integrate the feature into a user application.
-## User-Initiated Compaction Strategies ##
-Per-table compaction strategies were added in 1.6.0 to provide custom logic to
-decide which files are involved in a major compaction. In 1.7.0, the ability
-to specify a compaction strategy for a user-initiated compaction was added in
-[ACCUMULO-1798][ACCUMULO-1798]. This allows surgical compactions on a subset
-of tablet files. Previously, a user-initiated compaction would compact all
-files in a tablet.
-In the Java API, this new feature can be accessed in the following way:
-    Connection conn = ...
-    CompactionStrategyConfig csConfig = new CompactionStrategyConfig(strategyClassName).setOptions(strategyOpts);
-    CompactionConfig compactionConfig = new CompactionConfig().setCompactionStrategy(csConfig);
-    connector.tableOperations().compact(tableName, compactionConfig)
-In [ACCUMULO-3134][ACCUMULO-3134], the shell's `compact` command was modified
-to enable selecting which files to compact based on size, name, and path.
-Options were also added to the shell's compaction command to allow setting
-RFile options for the compaction output. Setting the output options could be
-useful for testing. For example, one tablet to be compacted using snappy
-The following is an example shell command that compacts all files less than
-10MB, if the tablet has at least two files that meet this criteria. If a
-tablet had a 100MB, 50MB, 7MB, and 5MB file then the 7MB and 5MB files would
-be compacted. If a tablet had a 100MB and 5MB file, then nothing would be done
-because there are not at least two files meeting the selection criteria.
-    compact -t foo --min-files 2 --sf-lt-esize 10M
-The following is an example shell command that compacts all bulk imported
-files in a table.
-    compact -t foo --sf-ename I.*
-These provided convenience options to select files execute using a specialized
-compaction strategy. Options were also added to the shell to specify an
-arbitrary compaction strategy. The option to specify an arbitrry compaction
-strategy is mutually exclusive with the file selection and file creation
-options, since those options are unique to the specialized compaction strategy
-provided. See `compact --help` in the shell for the available options.
-## API Clarification ##
-The declared API in 1.6.x was incomplete. Some important classes like
-ColumnVisibility were not declared as Accumulo API. Significant work was done
-under [ACCUMULO-3657][ACCUMULO-3657] to correct the API statement and clean up
-the API to be representative of all classes which users are intended to
-interact with. The expanded and simplified API statement is in the
-In some places in the API, non-API types were used. Ideally, public API
-members would only use public API types. A tool called [APILyzer][apilyzer]
-was created to find all API members that used non-API types. Many of the
-violations found by this tool were deprecated to clearly communicate that a
-non-API type was used. One example is a public API method that returned a
-class called `KeyExtent`. `KeyExtent` was never intended to be in the public
-API because it contains code related to Accumulo internals. `KeyExtent` and
-the API methods returning it have since been deprecated. These were replaced
-with a new class for identifying tablets that does not expose internals.
-Deprecating a type like this from the API makes the API more stable while also
-making it easier for contributors to change Accumulo internals without
-impacting the API.
-The changes in [ACCUMULO-3657][ACCUMULO-3657] also included an Accumulo API
-regular expression for use with checkstyle. Starting with 1.7.0, projects
-building on Accumulo can use this checkstyle rule to ensure they are only
-using Accumulo's public API. The regular expression can be found in the
 # Performance Improvements #
-## Configurable Threadpool Size for Assignments ##
-During start-up, the Master quickly assigns tablets to Tablet Servers. However,
-Tablet Servers load those assigned tablets one at a time. In 1.7, the servers
-will be more aggressive, and will load tablets in parallel, so long as they do
-not have mutations that need to be recovered.
-[ACCUMULO-1085] allows the size of the threadpool used in the Tablet Servers 
-for assignment processing to be configurable.
-## Group-Commit Threshold as a Factor of Data Size ##
-When ingesting data into Accumulo, the majority of time is spent in the
-write-ahead log. As such, this is a common place that optimizations are added.
-One optimization is known as "group-commit". When multiple clients are
-writing data to the same Accumulo tablet, it is not efficient for each of them
-to synchronize the WAL, flush their updates to disk for durability, and then
-release the lock. The idea of group-commit is that multiple writers can queue
-the write for their mutations to the WAL and then wait for a sync that will
-satisfy the durability constraints of their batch of updates. This has a
-drastic improvement on performance, since many threads writing batches
-concurrently can "share" the same `fsync`.
-In previous versions, Accumulo controlled the frequency in which this
-group-commit sync was performed as a factor of the number of clients writing
-to Accumulo. This was both confusing to correctly configure and also
-encouraged sub-par performance with few write threads.
-[ACCUMULO-1950][ACCUMULO-1950] introduced a new configuration property
-`` which defines the amount of data that is
-queued before a group-commit is performed in such a way that is agnostic of
-the number of writers. This new configuration property is much easier to
-reason about than the previous (now deprecated) `tserver.mutation.queue.max`.
-Users who have set `tserver.mutation.queue.max` in the past are encouraged
-to start using the new `` property.
 # Other improvements #
-## Balancing Groups of Tablets ##
-By default, Accumulo evenly spreads each table's tablets across a cluster. In
-some situations, it is advantageous for query or ingest to evenly spreads
-groups of tablets within a table. For [ACCUMULO-3439][ACCUMULO-3439], a new
-balancer was added to evenly spread groups of tablets to optimize performance.
-This [blog post][group_balancer] provides more details about when and why
-users may desire to leverage this feature..
-## User-specified Durability ##
-Accumulo constantly tries to balance durability with performance. Guaranteeing
-durability of every write to Accumulo is very difficult in a
-massively-concurrent environment that requires high throughput. One common
-area of focus is the write-ahead log, since it must eventually call `fsync` on
-the local filesystem to guarantee that data written is durable in the face of
-unexpected power failures. In some cases where durability can be sacrificed,
-either due to the nature of the data itself or redundant power supplies,
-ingest performance improvements can be attained.
-Prior to 1.7, a user could only configure the level of durability for
-individual tables. With the implementation of [ACCUMULO-1957][ACCUMULO-1957],
-the durability can be specified by the user when creating a `BatchWriter`,
-giving users control over durability at the level of the individual writes.
-Every `Mutation` written using that `BatchWriter` will be written with the
-provided durability. This can result in substantially faster ingest rates when
-the durability can be relaxed.
-## waitForBalance API ##
-When creating a new Accumulo table, the next step is typically adding splits
-to that table before starting ingest. This can be extremely important since a
-table without any splits will only be hosted on a single tablet server and
-create a ingest bottleneck until the table begins to naturally split. Adding
-many splits before ingesting will ensure that a table is distributed across
-many servers and result in high throughput when ingest first starts.
-Adding splits to a table has long been a synchronous operation, but the
-assignment of those splits was asynchronous. A large number of splits could be
-processed, but it was not guaranteed that they would be evenly distributed
-resulting in the same problem as having an insufficient number of splits.
-[ACCUMULO-2998][ACCUMULO-2998] adds a new method to `InstanceOperations` which
-allows users to wait for all tablets to be balanced. This method lets users
-wait until tablets are appropriately distributed so that ingest can be run at
-full-bore immediately.
-## Hadoop Metrics2 Support ##
-Accumulo has long had its own metrics system implemented using Java MBeans.
-This enabled metrics to be reported by Accumulo services, but consumption by
-other systems often required use of an additional tool like jmxtrans to read
-the metrics from the MBeans and send them to some other system.
-[ACCUMULO-1817][ACCUMULO-1817] replaces this custom metrics system Accumulo
-with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of
-which is invalidating the need for an additional process to send metrics to
-common metrics storage and visualization tools. With Metrics2 support,
-Accumulo can send its metrics to common tools like Ganglia and Graphite.
-For more information on enabling Hadoop Metrics2, see the [Metrics
-Chapter][metrics] in the Accumulo User Manual.
-## Distributed Tracing with HTrace ##
-HTrace has recently started gaining traction as a standalone project,
-especially with its adoption in HDFS. Accumulo has long had distributed
-tracing support via its own "Cloudtrace" library, but this wasn't intended for
-use outside of Accumulo.
-[ACCUMULO-898][ACCUMULO-898] replaces Accumulo's Cloudtrace code with HTrace.
-This has the benefit of adding timings (spans) from HDFS into Accumulo spans
-Users who inspect traces via the Accumulo Monitor (or another system) will begin
-to see timings from HDFS during operations like Major and Minor compactions when
-running with at least Apache Hadoop 2.6.0.
-## VERSIONS file present in binary distribution ##
-In the pre-built binary distribution or distributions built by users from the
-official source release, users will now see a `VERSIONS` file present in the
-`lib/` directory alongside the Accumulo server-side jars. Because the created
-tarball strips off versions from the jar file names, it can require extra work
-to actually find what the version of each dependent jar (typically inspecting
-the jar's manifest).
-[ACCUMULO-2863][ACCUMULO-2863] adds a `VERSIONS` file to the `lib/` directory
-which contains the Maven groupId, artifactId, and verison (GAV) information for
-each jar file included in the distribution.
-## Per-Table Volume Chooser ##
-The `VolumeChooser` interface is a server-side extension point that allows user
-tables to provide custom logic in choosing where its files are written when
-multiple HDFS instances are available. By default, a randomized volume chooser
-implementation is used to evenly balance files across all HDFS instances.
-Previously, this VolumeChooser logic was instance-wide which meant that it would
-affect all tables. This is potentially undesirable as it might unintentionally
-impact other users in a multi-tenant system. [ACCUMULO-3177][ACCUMULO-3177]
-introduces a new per-table property which supports configuration of a
-`VolumeChooser`. This ensures that the implementation to choose how HDFS
-utilization happens when multiple are available is limited to the expected
-subset of all tables.
-## Table and namespace custom properties ##
-In order to avoid errors caused by mis-typed configuration properties, Accumulo was strict
about which configuration properties 
-could be set. However, this prevented users from setting arbitrary properties that could
be used by custom balancers, compaction 
-strategies, volume choosers, and iterators. Under [ACCUMULO-2841][ACCUMULO-2841], the ability
to set arbitrary table and 
-namespace properties was added. The properties need to be prefixed with `table.custom.`.
 The changes made in 
-[ACCUMULO-3177][ACCUMULO-3177] and [ACCUMULO-3439][ACCUMULO-3439] leverage this new feature.
 # Notable Bug Fixes #
-## SourceSwitchingIterator Deadlock ##
-An instance of SourceSwitchingIterator, the Accumulo iterator which
-transparently manages whether data for a tablet read from memory (the
-in-memory map) or disk (HDFS after a minor compaction), was found deadlocked
-in a production system.
-This deadlock prevented the scan and the minor compaction from ever
-successfully completing without restarting the tablet server.
-[ACCUMULO-3745][ACCUMULO-3745] fixes the inconsistent synchronization inside
-of the SourceSwitchingIterator to prevent this deadlock from happening in the
-The only mitigation of this bug was to restart the tablet server that is
-## Table flush blocked indefinitely ##
-While running the Accumulo RandomWalk distributed test, it was observed that
-all activity in Accumulo had stopped and there was an offline Accumulo
-metadata table tablet. The system first tried to flush a user tablet, but the
-metadata table was not online (likely due to the agitation process which stops
-and starts Accumulo processes during the test). After this call, a call to
-load the metadata tablet was queued but could not complete until the previous
-flush call. Thus, a deadlock occurred.
-This deadlock happened because the synchronous flush call could not complete
-before the load tablet call completed, but the load tablet call couldn't run
-because of connection caching we perform in Accumulo's RPC layer to reduce the
-quantity of sockets we need to create to send data.
-[ACCUMULO-3597][ACCUMULO-3597] prevents this deadlock by forcing the use of a
-non-cached connection for the RPC message requesting a metadata tablet to be
-While this feature does result in additional network resources to be used, the
-concern is minimal because the number of metadata tablets is typically very
-small with respect to the total number of tablets in the system.
-The only mitigation of this bug was to restart the tablet server that is hung.
 # Testing #
@@ -384,64 +67,15 @@ Accumulo configuration from `120s` to `2
-    <td>Gentoo</tdt>
+    <td>N/A</tdt>
+    <td>N/A</td>
+    <td>N/A</td>
+    <td>N/A</td>
-    <td>1</td>
-    <td>No</td>
-    <td>Unit and Integration Tests</td>
-  </tr>
-  <tr>
-    <td>Gentoo</tdt>
-    <td>2.6.0</td>
-    <td>1 (2 TServers)</td>
-    <td>3.4.5</td>
-    <td>No</td>
-    <td>24hr CI w/ agitation and verification, 24hr RW w/o agitation.</td>
-  </tr>
-  <tr>
-    <td>Centos 6.6</td>
-    <td>2.6.0</td>
-    <td>3</td>
-    <td>3.4.6</td>
-    <td>No</td>
-    <td>24hr RW w/ agitation, 24hr CI w/o agitation, 72hr CI w/ and w/o agitation</td>
-  </tr>
-  <tr>
-    <td>Amazon Linux</td>
-    <td>2.6.0</td>
-    <td>20 m1large</td>
-    <td>3.4.6</td>
-    <td>No</td>
-    <td>24hr CI w/o agitation</td>
-[kerberos]: /1.7/accumulo_user_manual.html#_kerberos
-[metrics]: /1.7/accumulo_user_manual.html#_metrics
-[replication]: /1.7/accumulo_user_manual.html#_replication

View raw message