X-Mailer: ASF-Git Admin Mailer Subject: [15/50] [abbrv] accumulo-website git commit: Jekyll build from gh-pages:358b7b4 archived-at: Tue, 22 Nov 2016 19:41:04 -0000 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/release/accumulo-1.7.0/index.html ---------------------------------------------------------------------- diff --git a/release/accumulo-1.7.0/index.html b/release/accumulo-1.7.0/index.html new file mode 100644 index 0000000..99e04c5 --- /dev/null +++ b/release/accumulo-1.7.0/index.html @@ -0,0 +1,606 @@ + + + + + + + + + + + + +Apache Accumulo 1.7.0 + + + + + + + + + + + + +

+ + +

+ +

Apache Accumulo 1.7.0

+ +

18 May 2015

+ +

Apache Accumulo 1.7.0 is a significant release that includes many important +milestone features which expand the functionality of Accumulo. These include +features related to security, availability, and extensibility. Nearly 700 JIRA +issues were resolved in this version. Approximately two-thirds were bugs and +one-third were improvements.

+ +

In the context of Accumulo’s Semantic Versioning guidelines, +this is a “minor version”. This means that new APIs have been created, some +deprecations may have been added, but no deprecated APIs have been removed. +Code written against 1.6.x should work against 1.7.0, likely binary-compatible +but definitely source-compatible. As always, the Accumulo developers take API compatibility +very seriously and have invested much time to ensure that we meet the promises set forth to our users.

+ +

Major Changes

+ +

Updated Minimum Requirements

+ +

Apache Accumulo 1.7.0 comes with an updated set of minimum requirements.

+ +

Java7 is required. Java6 support is dropped.
Hadoop 2.2.0 or greater is required. Hadoop 1.x support is dropped.
ZooKeeper 3.4.x or greater is required.

+ +

Client Authentication with Kerberos

+ +

Kerberos is the de-facto means to provide strong authentication across Hadoop +and other related components. Kerberos requires a centralized key distribution +center to authentication users who have credentials provided by an +administrator. When Hadoop is configured for use with Kerberos, all users must +provide Kerberos credentials to interact with the filesystem, launch YARN +jobs, or even view certain web pages.

+ +

While Accumulo has long supported operating on Kerberos-enabled HDFS, it still +required Accumulo users to use password-based authentication to authenticate +with Accumulo. ACCUMULO-2815 added support for allowing +Accumulo clients to use the same Kerberos credentials to authenticate to +Accumulo that they would use to authenticate to other Hadoop components, +instead of a separate user name and password just for Accumulo.

+ +

This authentication leverages Simple Authentication and Security Layer +(SASL) and GSSAPI to support Kerberos authentication over the +existing Apache Thrift-based RPC infrastructure that Accumulo employs.

+ +

These additions represent a significant forward step for Accumulo, bringing +its client-authentication up to speed with the rest of the Hadoop ecosystem. +This results in a much more cohesive authentication story for Accumulo that +resonates with the battle-tested cell-level security and authorization model +already familiar to Accumulo users.

+ +

More information on configuration, administration, and application of Kerberos +client authentication can be found in the Kerberos chapter of the +Accumulo User Manual.

+ +

Data-Center Replication

+ +

In previous releases, Accumulo only operated within the constraints of a +single installation. Because single instances of Accumulo often consist of +many nodes and Accumulo’s design scales (near) linearly across many nodes, it +is typical that one Accumulo is run per physical installation or data-center. +ACCUMULO-378 introduces support in Accumulo to automatically +copy data from one Accumulo instance to another.

+ +

This data-center replication feature is primarily applicable to users wishing +to implement a disaster recovery strategy. Data can be automatically copied +from a primary instance to one or more other Accumulo instances. In contrast +to normal Accumulo operation, in which ingest and query are strongly +consistent, data-center replication is a lazy, eventually consistent +operation. This is desirable for replication, as it prevents additional +latency for ingest operations on the primary instance. Additionally, the +implementation of this feature can sustain prolonged outages between the +primary instance and replicas without any administrative overhead.

+ +

The Accumulo User Manual contains a new chapter on replication +which details the design and implementation of the feature, explains how users +can configure replication, and describes special cases to consider when +choosing to integrate the feature into a user application.

+ +

User-Initiated Compaction Strategies

+ +

Per-table compaction strategies were added in 1.6.0 to provide custom logic to +decide which files are involved in a major compaction. In 1.7.0, the ability +to specify a compaction strategy for a user-initiated compaction was added in +ACCUMULO-1798. This allows surgical compactions on a subset +of tablet files. Previously, a user-initiated compaction would compact all +files in a tablet.

+ +

In the Java API, this new feature can be accessed in the following way:

+ +

Connection conn = ...
+CompactionStrategyConfig csConfig = new CompactionStrategyConfig(strategyClassName).setOptions(strategyOpts);
+CompactionConfig compactionConfig = new CompactionConfig().setCompactionStrategy(csConfig);
+connector.tableOperations().compact(tableName, compactionConfig)
+

+ +

In ACCUMULO-3134, the shell’s compact command was modified +to enable selecting which files to compact based on size, name, and path. +Options were also added to the shell’s compaction command to allow setting +RFile options for the compaction output. Setting the output options could be +useful for testing. For example, one tablet to be compacted using snappy +compression.

+ +

The following is an example shell command that compacts all files less than +10MB, if the tablet has at least two files that meet this criteria. If a +tablet had a 100MB, 50MB, 7MB, and 5MB file then the 7MB and 5MB files would +be compacted. If a tablet had a 100MB and 5MB file, then nothing would be done +because there are not at least two files meeting the selection criteria.

+ +

compact -t foo --min-files 2 --sf-lt-esize 10M
+

+ +

The following is an example shell command that compacts all bulk imported +files in a table.

+ +

compact -t foo --sf-ename I.*
+

+ +

These provided convenience options to select files execute using a specialized +compaction strategy. Options were also added to the shell to specify an +arbitrary compaction strategy. The option to specify an arbitrry compaction +strategy is mutually exclusive with the file selection and file creation +options, since those options are unique to the specialized compaction strategy +provided. See compact --help in the shell for the available options.

+ +

API Clarification

+ +

The declared API in 1.6.x was incomplete. Some important classes like +ColumnVisibility were not declared as Accumulo API. Significant work was done +under ACCUMULO-3657 to correct the API statement and clean up +the API to be representative of all classes which users are intended to +interact with. The expanded and simplified API statement is in the +README.

+ +

In some places in the API, non-API types were used. Ideally, public API +members would only use public API types. A tool called APILyzer +was created to find all API members that used non-API types. Many of the +violations found by this tool were deprecated to clearly communicate that a +non-API type was used. One example is a public API method that returned a +class called KeyExtent. KeyExtent was never intended to be in the public +API because it contains code related to Accumulo internals. KeyExtent and +the API methods returning it have since been deprecated. These were replaced +with a new class for identifying tablets that does not expose internals. +Deprecating a type like this from the API makes the API more stable while also +making it easier for contributors to change Accumulo internals without +impacting the API.

+ +

The changes in ACCUMULO-3657 also included an Accumulo API +regular expression for use with checkstyle. Starting with 1.7.0, projects +building on Accumulo can use this checkstyle rule to ensure they are only +using Accumulo’s public API. The regular expression can be found in the +README.

+ +

Performance Improvements

+ +

Configurable Threadpool Size for Assignments

+ +

During start-up, the Master quickly assigns tablets to Tablet Servers. However, +Tablet Servers load those assigned tablets one at a time. In 1.7, the servers +will be more aggressive, and will load tablets in parallel, so long as they do +not have mutations that need to be recovered.

+ +

ACCUMULO-1085 allows the size of the threadpool used in the Tablet Servers +for assignment processing to be configurable.

+ +

Group-Commit Threshold as a Factor of Data Size

+ +

When ingesting data into Accumulo, the majority of time is spent in the +write-ahead log. As such, this is a common place that optimizations are added. +One optimization is known as “group-commit”. When multiple clients are +writing data to the same Accumulo tablet, it is not efficient for each of them +to synchronize the WAL, flush their updates to disk for durability, and then +release the lock. The idea of group-commit is that multiple writers can queue +the write for their mutations to the WAL and then wait for a sync that will +satisfy the durability constraints of their batch of updates. This has a +drastic improvement on performance, since many threads writing batches +concurrently can “share” the same fsync.

+ +

In previous versions, Accumulo controlled the frequency in which this +group-commit sync was performed as a factor of the number of clients writing +to Accumulo. This was both confusing to correctly configure and also +encouraged sub-par performance with few write threads. +ACCUMULO-1950 introduced a new configuration property +tserver.total.mutation.queue.max which defines the amount of data that is +queued before a group-commit is performed in such a way that is agnostic of +the number of writers. This new configuration property is much easier to +reason about than the previous (now deprecated) tserver.mutation.queue.max. +Users who have set tserver.mutation.queue.max in the past are encouraged +to start using the new tserver.total.mutation.queue.max property.

+ +

Other improvements

+ +

Balancing Groups of Tablets

+ +

By default, Accumulo evenly spreads each table’s tablets across a cluster. In +some situations, it is advantageous for query or ingest to evenly spreads +groups of tablets within a table. For ACCUMULO-3439, a new +balancer was added to evenly spread groups of tablets to optimize performance. +This blog post provides more details about when and why +users may desire to leverage this feature..

+ +

User-specified Durability

+ +

Accumulo constantly tries to balance durability with performance. Guaranteeing +durability of every write to Accumulo is very difficult in a +massively-concurrent environment that requires high throughput. One common +area of focus is the write-ahead log, since it must eventually call fsync on +the local filesystem to guarantee that data written is durable in the face of +unexpected power failures. In some cases where durability can be sacrificed, +either due to the nature of the data itself or redundant power supplies, +ingest performance improvements can be attained.

+ +

Prior to 1.7, a user could only configure the level of durability for +individual tables. With the implementation of ACCUMULO-1957, +the durability can be specified by the user when creating a BatchWriter, +giving users control over durability at the level of the individual writes. +Every Mutation written using that BatchWriter will be written with the +provided durability. This can result in substantially faster ingest rates when +the durability can be relaxed.

+ +

waitForBalance API

+ +

When creating a new Accumulo table, the next step is typically adding splits +to that table before starting ingest. This can be extremely important since a +table without any splits will only be hosted on a single tablet server and +create a ingest bottleneck until the table begins to naturally split. Adding +many splits before ingesting will ensure that a table is distributed across +many servers and result in high throughput when ingest first starts.

+ +

Adding splits to a table has long been a synchronous operation, but the +assignment of those splits was asynchronous. A large number of splits could be +processed, but it was not guaranteed that they would be evenly distributed +resulting in the same problem as having an insufficient number of splits. +ACCUMULO-2998 adds a new method to InstanceOperations which +allows users to wait for all tablets to be balanced. This method lets users +wait until tablets are appropriately distributed so that ingest can be run at +full-bore immediately.

+ +

Hadoop Metrics2 Support

+ +

Accumulo has long had its own metrics system implemented using Java MBeans. +This enabled metrics to be reported by Accumulo services, but consumption by +other systems often required use of an additional tool like jmxtrans to read +the metrics from the MBeans and send them to some other system.

+ +

ACCUMULO-1817 replaces this custom metrics system Accumulo +with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of +which is invalidating the need for an additional process to send metrics to +common metrics storage and visualization tools. With Metrics2 support, +Accumulo can send its metrics to common tools like Ganglia and Graphite.

+ +

For more information on enabling Hadoop Metrics2, see the Metrics +Chapter in the Accumulo User Manual.

+ +

Distributed Tracing with HTrace

+ +

HTrace has recently started gaining traction as a standalone project, +especially with its adoption in HDFS. Accumulo has long had distributed +tracing support via its own “Cloudtrace” library, but this wasn’t intended for +use outside of Accumulo.

+ +

ACCUMULO-898 replaces Accumulo’s Cloudtrace code with HTrace. +This has the benefit of adding timings (spans) from HDFS into Accumulo spans +automatically.

+ +

Users who inspect traces via the Accumulo Monitor (or another system) will begin +to see timings from HDFS during operations like Major and Minor compactions when +running with at least Apache Hadoop 2.6.0.

+ +

VERSIONS file present in binary distribution

+ +

In the pre-built binary distribution or distributions built by users from the +official source release, users will now see a VERSIONS file present in the +lib/ directory alongside the Accumulo server-side jars. Because the created +tarball strips off versions from the jar file names, it can require extra work +to actually find what the version of each dependent jar (typically inspecting +the jar’s manifest).

+ +

ACCUMULO-2863 adds a VERSIONS file to the lib/ directory +which contains the Maven groupId, artifactId, and verison (GAV) information for +each jar file included in the distribution.

+ +

Per-Table Volume Chooser

+ +

The VolumeChooser interface is a server-side extension point that allows user +tables to provide custom logic in choosing where its files are written when +multiple HDFS instances are available. By default, a randomized volume chooser +implementation is used to evenly balance files across all HDFS instances.

+ +

Previously, this VolumeChooser logic was instance-wide which meant that it would +affect all tables. This is potentially undesirable as it might unintentionally +impact other users in a multi-tenant system. ACCUMULO-3177 +introduces a new per-table property which supports configuration of a +VolumeChooser. This ensures that the implementation to choose how HDFS +utilization happens when multiple are available is limited to the expected +subset of all tables.

+ +

Table and namespace custom properties

+ +

In order to avoid errors caused by mis-typed configuration properties, Accumulo was strict about which configuration properties +could be set. However, this prevented users from setting arbitrary properties that could be used by custom balancers, compaction +strategies, volume choosers, and iterators. Under ACCUMULO-2841, the ability to set arbitrary table and +namespace properties was added. The properties need to be prefixed with table.custom.. The changes made in +ACCUMULO-3177 and ACCUMULO-3439 leverage this new feature.

+ +

Notable Bug Fixes

+ +

SourceSwitchingIterator Deadlock

+ +

An instance of SourceSwitchingIterator, the Accumulo iterator which +transparently manages whether data for a tablet read from memory (the +in-memory map) or disk (HDFS after a minor compaction), was found deadlocked +in a production system.

+ +

This deadlock prevented the scan and the minor compaction from ever +successfully completing without restarting the tablet server. +ACCUMULO-3745 fixes the inconsistent synchronization inside +of the SourceSwitchingIterator to prevent this deadlock from happening in the +future.

+ +

The only mitigation of this bug was to restart the tablet server that is +deadlocked.

+ +

Table flush blocked indefinitely

+ +

While running the Accumulo RandomWalk distributed test, it was observed that +all activity in Accumulo had stopped and there was an offline Accumulo +metadata table tablet. The system first tried to flush a user tablet, but the +metadata table was not online (likely due to the agitation process which stops +and starts Accumulo processes during the test). After this call, a call to +load the metadata tablet was queued but could not complete until the previous +flush call. Thus, a deadlock occurred.

+ +

This deadlock happened because the synchronous flush call could not complete +before the load tablet call completed, but the load tablet call couldn’t run +because of connection caching we perform in Accumulo’s RPC layer to reduce the +quantity of sockets we need to create to send data. +ACCUMULO-3597 prevents this deadlock by forcing the use of a +non-cached connection for the RPC message requesting a metadata tablet to be +loaded.

+ +

While this feature does result in additional network resources to be used, the +concern is minimal because the number of metadata tablets is typically very +small with respect to the total number of tablets in the system.

+ +

The only mitigation of this bug was to restart the tablet server that is hung.

+ +

Testing

+ +

Each unit and functional test only runs on a single node, while the RandomWalk +and Continuous Ingest tests run on any number of nodes. Agitation refers to +randomly restarting Accumulo processes and Hadoop DataNode processes, and, in +HDFS High-Availability instances, forcing NameNode fail-over.

+ +

During testing, multiple Accumulo developers noticed some stability issues +with HDFS using Apache Hadoop 2.6.0 when restarting Accumulo processes and +HDFS datanodes. The developers investigated these issues as a part of the +normal release testing procedures, but were unable to find a definitive cause +of these failures. Users are encouraged to follow +ACCUMULO-2388 if they wish to follow any future developments. +One possible workaround is to increase the general.rpc.timeout in the +Accumulo configuration from 120s to 240s.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

OS	Hadoop	Nodes	ZooKeeper	HDFS HA	Tests
Gentoo	N/A	1	N/A	No	Unit and Integration Tests
Gentoo	2.6.0	1 (2 TServers)	3.4.5	No	24hr CI w/ agitation and verification, 24hr RW w/o agitation.
Centos 6.6	2.6.0	3	3.4.6	No	24hr RW w/ agitation, 24hr CI w/o agitation, 72hr CI w/ and w/o agitation
Amazon Linux	2.6.0	20 m1large	3.4.6	No	24hr CI w/o agitation

+ + + +

View all releases in the archive

+ +

+ + + + + +

+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/release/accumulo-1.7.1/index.html ---------------------------------------------------------------------- diff --git a/release/accumulo-1.7.1/index.html b/release/accumulo-1.7.1/index.html new file mode 100644 index 0000000..6399e0e --- /dev/null +++ b/release/accumulo-1.7.1/index.html @@ -0,0 +1,356 @@ + + + + + + + + + + + + +Apache Accumulo 1.7.1 + + + + + + + + + + + + +

+ + +

+ +

Apache Accumulo 1.7.1

+ +

26 Feb 2016

+ +

Apache Accumulo 1.7.1 is a maintenance release on the 1.7 version branch. This +release contains changes from more than 150 issues, comprised of bug-fixes, +performance improvements, build quality improvements, and more. See +JIRA for a complete list.

+ +

Users of any previous 1.7.x release are strongly encouraged to update as soon +as possible to benefit from the improvements with very little concern in change +of underlying functionality. Users of 1.6 or earlier that are seeking to +upgrade to 1.7 should consider 1.7.1 as a starting point.

+ +

Highlights

+ +

Silent data-loss via bulk imported files

+ +

A user recently reported that a simple bulk-import application would +occasionally lose some records. Through investigation, it was found that when +bulk imports into a table failed the initial assignment, the logic that +automatically retries the imports was incorrectly choosing the tablets to +import the files into. ACCUMULO-3967 contains more information +on the cause and identification of the bug. The data-loss condition would only +affect entire files. If records from a file exist in Accumulo, it is still +guaranteed that all records within that imported file were successful.

+ +

As such, users who have bulk import applications using previous versions of +Accumulo should verify that all of their data was correctly ingested into +Accumulo and immediately update to Accumulo 1.7.1 (This is the same bug that +was fixed in 1.6.4, so you won’t be affected if you’re running 1.6.4 or newer).

+ +

Queued Compactions Not Running

+ +

Found and fixed a bug (ACCUMULO-4016) in which some queued +compactions would never run if the number of files changed while the tablet was +queued.

+ +

Kerberos Ticket Renewals

+ +

A bug was fixed which caused Accumulo clients and services to fail to check and +(if necessary) renew their Kerberos credentials. This would eventually lead to +these components failing to properly authenticate until they were restarted. +(ACCUMULO-4069)

+ +

Updated commons-collection

+ +

The bundled commons-collection library was updated from version 3.2.1 to 3.2.2 +because of a reported vulnerability in that library. +(ACCUMULO-4056)

+ +

Faster Processing of Conditional Mutations

+ +

Improved ConditionalMutation processing time by a factor of 3. +(ACCUMULO-4066)

+ +

Slow GC While Bulk Importing

+ +

Found and worked around an issue where lots of bulk imports creating many new +files would significantly impair the Accumulo GC service, and possibly prevent +it from running to completion entirely. (ACCUMULO-4021)

+ +

Unnoticed Per-table Configuration Updates

+ +

Fixed a bug which caused tablet servers to not notice changes to the per-table +constraints, under some circumstances. (ACCUMULO-3859)

+ +

TabletServers kill themselves on CentOS7

+ +

Reduced the aggressiveness with which Accumulo Tablet Servers preemptively +killed themselves when a local filesystem switched to read-only (indicating a +possible failure). To reduce false positives, such as those which can occur +with systemd’s extra cgroup mounts in CentOS7, an additional check was added to +ensure that tablet servers would only kill themselves if an ext- or +xfs-formatted disk switched to read-only. (ACCUMULO-4080)

+ +

Improvements in Locating Client Configuration File

+ +

Fixed some unexpected error messages related to setting +ACCUMULO_CLIENT_CONF_PATH, and improved the detection of the client.conf file if +ACCUMULO_CLIENT_CONF_PATH was set to a directory containing client.conf. +(ACCUMULO-4026,ACCUMULO-4027)

+ +

Transient ZooKeeper disconnect causes FATE threads to exit

+ +

ZooKeeper clients are expected to handle the situation where they become +disconnected from the ZooKeeper server and must wait to be reconnected +before continuing ZooKeeper operations.

+ +

The dedicated threads running inside the Accumulo Master process for FATE +actions had the potential unexpectedly exit in this disconnected state. +This caused a scenario where all future FATE-based operations would +be blocked until the Accumulo Master process was restarted. (ACCUMULO-4060)

+ +

Incorrect management of certain Apache Thrift RPCs

+ +

Accumulo relies on Apache Thrift to implement remote procedure calls between +Accumulo services. Accumulo’s use of Thrift uncovered an unfortunate situation +where a special RPC (a “oneway” call) would leave unwanted data on the underlying +Thrift connection. After this extra data was left on connection, all subsequent RPCs +re-using that connection would fail with “out of sequence response” error messages. +Accumulo would be left in a bad state until the mishandled connections were released +or Accumulo services were restarted. (ACCUMULO-4065)

+ +

Other Notable Changes

+ +

ACCUMULO-3509 Fixed some lock contention in TabletServer, preventing resource cleanup
ACCUMULO-3734 Fixed quote-escaping bug in VisibilityConstraint
ACCUMULO-4025 Fixed cleanup of bulk load fate transactions
ACCUMULO-4098,ACCUMULO-4113 Fixed widespread misuse of ByteBuffer

+ +

Testing

+ +

Each unit and functional test only runs on a single node, while the RandomWalk +and Continuous Ingest tests run on any number of nodes. Agitation refers to +randomly restarting Accumulo processes and Hadoop Datanode processes, and, in +HDFS High-Availability instances, forcing NameNode failover.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

OS/Environment	Hadoop	Nodes	ZooKeeper	HDFS HA	Tests
CentOS 7.1 w/Oracle JDK8 on EC2 (1 m3.xlarge, 8 d2.xlarge)	2.6.3	9	3.4.6	No	Random walk (All.xml) 24-hour run, saw ACCUMULO-3794 and ACCUMULO-4151.
CentOS 7.1 w/Oracle JDK8 on EC2 (1 m3.xlarge, 8 d2.xlarge)	2.6.3	9	3.4.6	No	21 hr run of CI w/ agitation, 23.1B entries verified.
CentOS 7.1 w/Oracle JDK8 on EC2 (1 m3.xlarge, 8 d2.xlarge)	2.6.3	9	3.4.6	No	24 hr run of CI w/o agitation, 23.0B entries verified; saw performance issues outlined in comment on ACCUMULO-4146.
CentOS 6.7 (OpenJDK 7), Fedora 23 (OpenJDK 8), and CentOS 7.2 (OpenJDK 7)	2.6.1	1	3.4.6	No	All unit tests and ITs pass with -Dhadoop.version=2.6.1; Kerberos ITs had a problem with earlier versions of Hadoop

+ + + +

View all releases in the archive

+ +

+ + + + + +

+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/release/accumulo-1.7.2/index.html ---------------------------------------------------------------------- diff --git a/release/accumulo-1.7.2/index.html b/release/accumulo-1.7.2/index.html new file mode 100644 index 0000000..1788d4a --- /dev/null +++ b/release/accumulo-1.7.2/index.html @@ -0,0 +1,292 @@ + + + + + + + + + + + + +Apache Accumulo 1.7.2 + + + + + + + + + + + + +

+ + +

+ +

Apache Accumulo 1.7.2

+ +

22 Jun 2016

+ +

Apache Accumulo 1.7.2 is a maintenance release on the 1.7 version branch. This +release contains changes from more than 150 issues, comprised of bug-fixes, +performance improvements, build quality improvements, and more. See +JIRA for a complete list.

+ +

Below are resources for this release:

+ +

Highlights

+ +

Write-Ahead Logs can be prematurely deleted

+ +

There were cases where the Accumulo Garbage Collector may inadvertently delete a WAL for a tablet server that it has erroneously determined to be down, causing data loss. This has been corrected. See ACCUMULO-4157 for additional detail.

+ +

Upgrade to Commons-VFS 2.1

+ +

Upgrading to Apache Commons VFS 2.1 fixes several issues with classloading out of HDFS. For further detail see ACCUMULO-4146. Additional fixes to a potential HDFS class loading deadlock situation were made in ACCUMULO-4341.

+ +

Native Map failed to increment mutation count properly

+ +

There was a bug (ACCUMULO-4148) where multiple put calls with identical keys and no timestamp would exhibit different behaviour depending on whether native maps were enabled or not. This behaviour would result in hidden mutations with native maps, and has been corrected.

+ +

Open WAL files could prevent DataNode decomission

+ +

An improvement was introduced to allow a max age before WAL files would be automatically rolled. Without a max age, they could stay open for writing indefinitely, blocking the Hadoop DataNode decomissioning process. For more information, see ACCUMULO-4004.

+ +

Remove unnecessary copy of cached RFile index blocks

+ +

Accumulo maintains an cache for file blocks in-memory as a performance optimization. This can be done safely because Accumulo RFiles are immutable, thus their blocks are also immutable. There are two types of these blocks: index and data blocks. Index blocks refer to the b-tree style index inside of each Accumulo RFile, while data blocks contain the sorted Key-Value pairs. In previous versions, when Accumulo extracted an Index block from the in-memory cache, it would copy the data. ACCUMULO-4164 removes this unnecessary copy as the contents are immutable and can be passed by reference. Ensuring that the Index blocks are not copied when accessed from the cache is a big performance gain at the file-access level.

+ +

Analyze Key-length to avoid choosing large Keys for RFile Index blocks

+ +

Accumulo’s RFile index blocks are made up of a Key which exists in the file and points to that specific location in the corresponding RFile data block. Thus, the size of the RFile index blocks is largely dominated by the size of the Keys which are used by the index. ACCUMULO-4314 is an improvement which uses statistics on the length of the Keys in the Rfile to avoid choosing Keys for the index whose length is greater than three standard deviations for the RFile. By choosing smaller Keys for the index, Accumulo can access the RFile index faster and keep more Index blocks cached in memory. Initial tests showed that with this change, the RFile index size was nearly cut in half.

+ +

Minor performance improvements.

+ +

Tablet servers would previously always hsync at the start of a minor compaction, causing delays in the write pipeline. These additional syncs were determined to provide no additional durability guarantees and have been removed. See ACCUMULO-4112 for additional detail.

+ +

A performance issue was identified and corrected (ACCUMULO-1755) where the BatchWriter would block calls to addMutation while looking up destination tablet server metadata. The writer has been fixed to allow both operations in parallel.

+ +

Other Notable Changes

+ +

ACCUMULO-3923 bootstrap_hdfs.sh script would copy incorrect jars to hdfs.
ACCUMULO-4146 Avoid copy of RFile Index Blocks when already in cache.
ACCUMULO-4155 No longer publish javadoc for non-public API to website. (Still available in javadoc jars in maven)
ACCUMULO-4173 Provide balancer to balance table within subset of hosts.
ACCUMULO-4334 Ingest rates reported through JMX did not match rates reported by Monitor.
ACCUMULO-4335 Error conditions that result in a Halt should ensure non-zero process exit code.

+ +

Testing

+ +

Each unit and functional test only runs on a single node, while the RandomWalk +and Continuous Ingest tests run on any number of nodes. Agitation refers to +randomly restarting Accumulo processes and Hadoop Datanode processes, and, in +HDFS High-Availability instances, forcing NameNode failover.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

OS/Environment	Hadoop	Nodes	ZooKeeper	HDFS HA	Tests
CentOS 7; EC2 m3.xlarge, d2.xlarge workers	2.6.3	9	3.4.8	No	24 HR Continuous Ingest with and without Agitation.
CentOS 6: EC2 m3.2xlarge	2.6.1	1	3.4.5	No	Unit tests and Integration Tests

+ + + +

View all releases in the archive

+ +

+ + + + + +

+ +