Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E5270200BD8 for ; Tue, 22 Nov 2016 20:38:25 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E3F13160B0C; Tue, 22 Nov 2016 19:38:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3E55E160B3A for ; Tue, 22 Nov 2016 20:38:22 +0100 (CET) Received: (qmail 36226 invoked by uid 500); 22 Nov 2016 19:38:21 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 34893 invoked by uid 99); 22 Nov 2016 19:38:20 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Nov 2016 19:38:20 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5B84BE2F11; Tue, 22 Nov 2016 19:38:20 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: ctubbsii@apache.org To: commits@accumulo.apache.org Date: Tue, 22 Nov 2016 19:38:53 -0000 Message-Id: <645d4319ac34426ea2fd90262dd69f02@git.apache.org> In-Reply-To: <5fa1326ab0164468afbcf1d3a56d2de3@git.apache.org> References: <5fa1326ab0164468afbcf1d3a56d2de3@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [35/50] [abbrv] accumulo-website git commit: ACCUMULO-4518 Use Jekyll posts for releases archived-at: Tue, 22 Nov 2016 19:38:26 -0000 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.6.3.md ---------------------------------------------------------------------- diff --git a/release_notes/1.6.3.md b/release_notes/1.6.3.md deleted file mode 100644 index 4d477e2..0000000 --- a/release_notes/1.6.3.md +++ /dev/null @@ -1,112 +0,0 @@ ---- -title: Apache Accumulo 1.6.3 Release Notes ---- - -Apache Accumulo 1.6.3 is a maintenance release on the 1.6 version branch. -This release contains changes from over 63 issues, comprised of bug-fixes, -performance improvements and better test cases. See [JIRA][JIRA_163] for a -complete list. - -Users of 1.6.0, 1.6.1, and 1.6.2 are strongly encouraged to update as soon as -possible to benefit from the improvements with very little concern in change -of underlying functionality. Users of 1.4 or 1.5 that are seeking to upgrade -to 1.6 should consider 1.6.3 as a starting point. For information about -improvements since Accumulo 1.5, see the [1.6.0][3], [1.6.1][4], and -[1.6.2][5] release notes. - -## Fixed BatchWriter hold time error - -In previous releases, a `BatchWriter` could fail with a -`MutationsRejectedException` with server errors. If inspection of the tserver -logs showed `HoldTimeoutException` was the cause, the workaround was to -increase the value of `general.rpc.timeout`. Changing this setting is no -longer necessary as this bug was fixed by [ACCUMULO-2388][ACCUMULO-2388]. - -## Severe bug fixes - - * [ACCUMULO-3597][ACCUMULO-3597] Fixed a deadlock where a table flush and - metadata tablet load were waiting on each other. This was a rare bug. If it - occurred it could impact the availability of Accumulo as most Accumulo - operations depend on metadata tablets. - * [ACCUMULO-3709][ACCUMULO-3709] Fixed a potential data loss bug where - AccumuloOutputFormat close did not rethrow exception. - * [ACCUMULO-3745][ACCUMULO-3745] Fixed a deadlock in SourceSwitchingIterator - that occurred when using custom iterators that called `deepCopy`. This bug - would cause scans to hang indefinitely until the offending tserver was killed. - * [ACCUMULO-3859][ACCUMULO-3859] Fixed a race condition that could prevent table - constraints from ever loading for a Tablet. It is likely to only affect users - when the constraint is first added to a table. - -## Notable bug fixes - - * [ACCUMULO-3589][ACCUMULO-3589] `du` in Shell does not check table existence. - * [ACCUMULO-3692][ACCUMULO-3692] Offline'ing a table disabled subsequent balancing. - * [ACCUMULO-3696][ACCUMULO-3696] Tracing could queue too many traces - * [ACCUMULO-3718][ACCUMULO-3718] Fixed a bug that prevented a Mutation from - being created in Scala. - * [ACCUMULO-3747][ACCUMULO-3747] Thrashing tablet servers would be removed from the Monitor's Dead Server list. - * [ACCUMULO-3750][ACCUMULO-3750] Fixed an issue where the Master would perpetually - fail when there was a bad `instance.secret` setting. - * [ACCUMULO-3784][ACCUMULO-3784] Fixed a bug in `getauths` Shell command where it - treated visibilities that differed only in case as the same. - * [ACCUMULO-3796][ACCUMULO-3796] Added documentation about turning off zone - reclaim. - * [ACCUMULO-3880][ACCUMULO-3880] Fixed an issue where malformed configuration caused - TabletServers to shutdown. - * [ACCUMULO-3890][ACCUMULO-3890] Fixed a performance issue with CredentialProvider. Information - stored in the CredentialProvider was not cached which resulted in repeatedly reading the - file from HDFS which can degrade HDFS performance. - -## Known Issues - -During testing [HDFS-8406][1] was encountered which is summarized by write-ahead log recovery -that was never completed due to an inability to recover the HDFS lease on the WAL. To work around -this issue, the following steps can be done: - - 1. Locate block for walog whose lease can not be recovered. - 2. Copy block into HDFS as temp file TMP_WALOG - 3. Delete the walog whose lease can not be recovered. - 4. Move TMP_WALOG to the filename of the walog deleted in the previous step. - -Using the `fetchColumns()` method on a scanner in conjunction with custom iterators that -add column families in their `seek()` method can lead to unexpected behavior. See -[ACCUMULO-3905][ACCUMULO-3905] for more details. In that issue javadoc updates were made, -but the updates did not make it into 1.6.3. - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop Datanode processes, and, in -HDFS High-Availability instances, forcing NameNode failover. - -{: #release_notes_testing .table } -| OS | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|----------------------|--------|-------|-----------|---------|-----------------------------------------------------------------| -| Amazon Linux 2014.09 | 2.6.0 | 20 | 3.4.5 | No | 24hr ContinuousIngest w/ verification w/ and w/o agitation | -| Amazon Linux 2014.09 | 2.6.0 | 20 | 3.4.5 | No | 24hr Randomwalk w/o agitation | -| Centos 6.5 | 2.7.1 | 6 | 3.4.5 | No | Continuous Ingest and Verify (6B entries) | -| Centos 6.6 | 2.2.0 | 6 | 3.4.5 | No | All integration test passed. Some needed to be run a 2nd time. | - -[1]: https://issues.apache.org/jira/browse/HDFS-8406 -[3]: {{ site.baseurl }}/release_notes/1.6.0 -[4]: {{ site.baseurl }}/release_notes/1.6.1 -[5]: {{ site.baseurl }}/release_notes/1.6.2 - -[ACCUMULO-2388]: https://issues.apache.org/jira/browse/ACCUMULO-2388 -[ACCUMULO-3589]: https://issues.apache.org/jira/browse/ACCUMULO-3589 -[ACCUMULO-3597]: https://issues.apache.org/jira/browse/ACCUMULO-3597 -[ACCUMULO-3692]: https://issues.apache.org/jira/browse/ACCUMULO-3692 -[ACCUMULO-3696]: https://issues.apache.org/jira/browse/ACCUMULO-3696 -[ACCUMULO-3709]: https://issues.apache.org/jira/browse/ACCUMULO-3709 -[ACCUMULO-3718]: https://issues.apache.org/jira/browse/ACCUMULO-3718 -[ACCUMULO-3745]: https://issues.apache.org/jira/browse/ACCUMULO-3745 -[ACCUMULO-3747]: https://issues.apache.org/jira/browse/ACCUMULO-3747 -[ACCUMULO-3750]: https://issues.apache.org/jira/browse/ACCUMULO-3750 -[ACCUMULO-3784]: https://issues.apache.org/jira/browse/ACCUMULO-3784 -[ACCUMULO-3796]: https://issues.apache.org/jira/browse/ACCUMULO-3796 -[ACCUMULO-3859]: https://issues.apache.org/jira/browse/ACCUMULO-3859 -[ACCUMULO-3880]: https://issues.apache.org/jira/browse/ACCUMULO-3880 -[ACCUMULO-3890]: https://issues.apache.org/jira/browse/ACCUMULO-3890 -[ACCUMULO-3905]: https://issues.apache.org/jira/browse/ACCUMULO-3905 -[JIRA_163]: https://issues.apache.org/jira/browse/ACCUMULO/fixforversion/12329154 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.6.4.md ---------------------------------------------------------------------- diff --git a/release_notes/1.6.4.md b/release_notes/1.6.4.md deleted file mode 100644 index 62757cc..0000000 --- a/release_notes/1.6.4.md +++ /dev/null @@ -1,68 +0,0 @@ ---- -title: Apache Accumulo 1.6.4 Release Notes ---- - -Apache Accumulo 1.6.4 is a maintenance release on the 1.6 version branch. -This release contains changes from 21 issues, comprised of bug-fixes, -performance improvements and better test cases. See [JIRA][JIRA_164] for a -complete list. - -Users of any previous 1.6.x release are strongly encouraged to update as soon as -possible to benefit from the improvements with very little concern in change -of underlying functionality. Users of 1.4 or 1.5 that are seeking to upgrade -to 1.6 should consider 1.6.4 as a starting point. - -## Silent data-loss via bulk imported files - -A user recently reported that a simple bulk-import application would occasionally -lose some records. Through investigation, it was found that when bulk imports into -a table failed the initial assignment, the logic that automatically retries the -imports was incorrectly choosing the tablets to import the files into. [ACCUMULO-3967][ACCUMULO-3967] -contains more information on the cause and identification of the bug. The data-loss -condition would only affect entire files. If records from a file exist in Accumulo, -it is still guaranteed that all records within that imported file were successful. - -As such, users who have bulk import applications using previous versions of Accumulo -should verify that all of their data was correctly ingested into Accumulo and -immediately update to Accumulo 1.6.4. - -## Other bug fixes - - * [ACCUMULO-3979][ACCUMULO-3979] Fixed an issue where the BulkImporter failed - with an error message "QUERY_METADATA already started". - * [ACCUMULO-3965][ACCUMULO-3965] The `listscans` shell command did not contain - the `scanId` attribute for currently running scans. - * [ACCUMULO-3946][ACCUMULO-3946] Verified that all user-facing operations contained - appropriate audit messages. - * [ACCUMULO-3977][ACCUMULO-3977] Isolated scans with Iterators in use incorrectly - fail around invocation of `deepCopy`. - * [ACCUMULO-3905][ACCUMULO-3905] RowDeletingIterator functions incorrectly when - columns are provided by the client. This restores intended functionality without - the need for a [workaround][3905-workaround]. - * [ACCUMULO-3959][ACCUMULO-3959] [ACCUMULO-3934][ACCUMULO-3934] Multiple documentation - improvements to `BatchScanner`. - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop Datanode processes, and, in -HDFS High-Availability instances, forcing NameNode failover. - -{: #release_notes_testing .table } -| OS | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|----------------------|--------|-------|-----------|---------|---------------------------------------------------------------------| -| Amazon Linux 2014.09 | 2.6.0 | 20 | 3.4.5 | No | ContinuousIngest w/ verification w/ and w/o agitation (37B entries) | - -[ACCUMULO-3979]: https://issues.apache.org/jira/browse/ACCUMULO-3979 -[ACCUMULO-3965]: https://issues.apache.org/jira/browse/ACCUMULO-3965 -[ACCUMULO-3946]: https://issues.apache.org/jira/browse/ACCUMULO-3946 -[ACCUMULO-3977]: https://issues.apache.org/jira/browse/ACCUMULO-3977 -[ACCUMULO-3905]: https://issues.apache.org/jira/browse/ACCUMULO-3905 -[3905-workaround]: https://issues.apache.org/jira/browse/ACCUMULO-1801?focusedCommentId=13970204&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13970204 -[ACCUMULO-3959]: https://issues.apache.org/jira/browse/ACCUMULO-3959 -[ACCUMULO-3934]: https://issues.apache.org/jira/browse/ACCUMULO-3934 -[ACCUMULO-3967]: https://issues.apache.org/jira/browse/ACCUMULO-3967 - - -[JIRA_164]: https://issues.apache.org/jira/browse/ACCUMULO/fixforversion/12332840 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.6.5.md ---------------------------------------------------------------------- diff --git a/release_notes/1.6.5.md b/release_notes/1.6.5.md deleted file mode 100644 index 1b6a121..0000000 --- a/release_notes/1.6.5.md +++ /dev/null @@ -1,109 +0,0 @@ ---- -title: Apache Accumulo 1.6.5 Release Notes ---- - -Apache Accumulo 1.6.5 is a maintenance release on the 1.6 version branch. This -release contains changes from 55 issues, comprised of bug-fixes, performance -improvements, build quality improvements, and more. See [JIRA][JIRA_165] for a -complete list. - -Users of any previous 1.6.x release are strongly encouraged to update as soon as -possible to benefit from the improvements with very little concern in change of -underlying functionality. Users of 1.4 or 1.5 that are seeking to upgrade to 1.6 -should consider 1.6.5 as a starting point. - -## Outstanding Known Issues - -Be aware that a small documentation bug exists with the compact command in the -shell ([ACCUMULO-4138][ACCUMULO-4138]). The documentation for the begin row and -end row should be described as exclusive and inclusive, respectively, rather -than the incorrect description of both being inclusive. - -## Highlights - -### Queued Compactions Not Running - -Found and fixed a bug ([ACCUMULO-4016][ACCUMULO-4016]) in which some queued -compactions would never run if the number of files changed while the tablet was -queued. - -### Faster Processing of Conditional Mutations - -Improved ConditionalMutation processing time by a factor of 3. -([ACCUMULO-4066][ACCUMULO-4066]) - -### Slow GC While Bulk Importing - -Found and worked around an issue where lots of bulk imports creating many new -files would significantly impair the Accumulo GC service, and possibly prevent -it from running to completion entirely. ([ACCUMULO-4021][ACCUMULO-4021]) - -### Improvements in Locating Client Configuration File - -Fixed some unexpected error messages related to setting -ACCUMULO_CLIENT_CONF_PATH, and improved the detection of the client.conf file if -ACCUMULO_CLIENT_CONF_PATH was set to a directory containing client.conf. -([ACCUMULO-4026][ACCUMULO-4026],[ACCUMULO-4027][ACCUMULO-4027]) - -### Transient ZooKeeper disconnect causes FATE threads to exit - -ZooKeeper clients are expected to handle the situation where they become -disconnected from the ZooKeeper server and must wait to be reconnected -before continuing ZooKeeper operations. - -The dedicated threads running inside the Accumulo Master process for FATE -actions had the potential unexpectedly exit in this disconnected state. -This caused a scenario where all future FATE-based operations would -be blocked until the Accumulo Master process was restarted. ([ACCUMULO-4060][ACCUMULO-4060]) - -### Incorrect management of certain Apache Thrift RPCs - -Accumulo relies on Apache Thrift to implement remote procedure calls between -Accumulo services. Accumulo's use of Thrift uncovered an unfortunate situation -where a special RPC (a "oneway" call) would leave unwanted data on the underlying -Thrift connection. After this extra data was left on connection, all subsequent RPCs -re-using that connection would fail with "out of sequence response" error messages. -Accumulo would be left in a bad state until the mishandled connections were released -or Accumulo services were restarted. ([ACCUMULO-4065][ACCUMULO-4065]) - -## Other Notable Changes - - * [ACCUMULO-3509][ACCUMULO-3509] Fixed some lock contention in TabletServer, preventing resource cleanup - * [ACCUMULO-3734][ACCUMULO-3734] Fixed quote-escaping bug in VisibilityConstraint - * [ACCUMULO-4025][ACCUMULO-4025] Fixed cleanup of bulk load fate transactions - * [ACCUMULO-4070][ACCUMULO-4070] Fixed Kerberos ticket renewal for all Accumulo services - * [ACCUMULO-4098][ACCUMULO-4098],[ACCUMULO-4113][ACCUMULO-4113] Fixed widespread misuse of ByteBuffer - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop Datanode processes, and, in -HDFS High-Availability instances, forcing NameNode failover. - -{: #release_notes_testing .table } -| OS | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|--------------------------|-----------------|-------|-----------|---------|-----------------------------------------------------------------------------------------------------------------| -| CentOS 7.1 | 2.6.3 | 9 | 3.4.6 | No | Random walk (All.xml) 18-hour run (2 failures, both conflicting operations on same table in Concurrent test) | -| CentOS 7.1 | 2.6.3 | 6 | 3.4.6 | No | Continuous ingest with agitation (2B entries) | -| CentOS 6.7 | 2.2.0 and 1.2.1 | 1 | 3.3.6 | No | All unit and integration tests | -| CentOS 7.1 (Oracle JDK8) | 2.6.3 | 9 | 3.4.6 | No | Continuous ingest with agitation (24hrs, 32B entries verified) on EC2 (1 m3.xlarge leader; 8 d2.xlarge workers) | - - -[JIRA_165]: https://issues.apache.org/jira/browse/ACCUMULO/fixforversion/12333674 - -[ACCUMULO-3509]: https://issues.apache.org/jira/browse/ACCUMULO-3509 -[ACCUMULO-3734]: https://issues.apache.org/jira/browse/ACCUMULO-3734 -[ACCUMULO-4016]: https://issues.apache.org/jira/browse/ACCUMULO-4016 -[ACCUMULO-4021]: https://issues.apache.org/jira/browse/ACCUMULO-4021 -[ACCUMULO-4025]: https://issues.apache.org/jira/browse/ACCUMULO-4025 -[ACCUMULO-4026]: https://issues.apache.org/jira/browse/ACCUMULO-4026 -[ACCUMULO-4027]: https://issues.apache.org/jira/browse/ACCUMULO-4027 -[ACCUMULO-4060]: https://issues.apache.org/jira/browse/ACCUMULO-4060 -[ACCUMULO-4065]: https://issues.apache.org/jira/browse/ACCUMULO-4065 -[ACCUMULO-4066]: https://issues.apache.org/jira/browse/ACCUMULO-4066 -[ACCUMULO-4070]: https://issues.apache.org/jira/browse/ACCUMULO-4070 -[ACCUMULO-4098]: https://issues.apache.org/jira/browse/ACCUMULO-4098 -[ACCUMULO-4113]: https://issues.apache.org/jira/browse/ACCUMULO-4113 -[ACCUMULO-4138]: https://issues.apache.org/jira/browse/ACCUMULO-4138 - http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.6.6.md ---------------------------------------------------------------------- diff --git a/release_notes/1.6.6.md b/release_notes/1.6.6.md deleted file mode 100644 index 1ae6fcb..0000000 --- a/release_notes/1.6.6.md +++ /dev/null @@ -1,129 +0,0 @@ ---- -title: Apache Accumulo 1.6.6 Release Notes ---- - -Apache Accumulo 1.6.6 is a maintenance release on the 1.6 version branch. This -release contains changes from more than 40 issues, comprised of bug-fixes, -performance improvements, build quality improvements, and more. See -[JIRA][JIRA_166] for a complete list. - -Users of any previous 1.6.x release are strongly encouraged to update as soon -as possible to benefit from the improvements with very little concern in change -of underlying functionality. - -As of this release, active development has ceased for the 1.6 release line, so -users should consider upgrading to a newer, actively maintained version when -they can. While the developers may release another 1.6 version to address a -severe issue, there's a strong possibility that this will be the last 1.6 -release. That would also mean that this will be the last Accumulo version to -support Java 6 and Hadoop 1. - -## Highlights - -### Write-Ahead Logs can be prematurely deleted - -There were cases where the Accumulo Garbage Collector may inadvertently delete -a WAL for a tablet server that it has erroneously determined to be down, -causing data loss. This has been corrected. See [ACCUMULO-4157][ACCUMULO-4157] -for additional detail. - -### Upgrade to Commons-VFS 2.1 - -Upgrading to Apache Commons VFS 2.1 fixes several issues with classloading out -of HDFS. For further detail see [ACCUMULO-4146][ACCUMULO-4146]. Additional -fixes to a potential HDFS class loading deadlock situation were made in -[ACCUMULO-4341][ACCUMULO-4341]. - -### Native Map failed to increment mutation count properly - -There was a bug ([ACCUMULO-4148][ACCUMULO-4148]) where multiple put calls with -identical keys and no timestamp would exhibit different behaviour depending on -whether native maps were enabled or not. This behaviour would result in hidden -mutations with native maps, and has been corrected. - -### Open WAL files could prevent DataNode decomission - -An improvement was introduced to allow a max age before WAL files would be -automatically rolled. Without a max age, they could stay open for writing -indefinitely, blocking the Hadoop DataNode decomissioning process. For more -information, see [ACCUMULO-4004][ACCUMULO-4004]. - -### Remove unnecessary copy of cached RFile index blocks - -Accumulo maintains an cache for file blocks in-memory as a performance -optimization. This can be done safely because Accumulo RFiles are immutable, -thus their blocks are also immutable. There are two types of these blocks: -index and data blocks. Index blocks refer to the b-tree style index inside of -each Accumulo RFile, while data blocks contain the sorted Key-Value pairs. In -previous versions, when Accumulo extracted an Index block from the in-memory -cache, it would copy the data. [ACCUMULO-4164][ACCUMULO-4164] removes this -unnecessary copy as the contents are immutable and can be passed by reference. -Ensuring that the Index blocks are not copied when accessed from the cache is a -big performance gain at the file-access level. - -### Analyze Key-length to avoid choosing large Keys for RFile Index blocks - -Accumulo's RFile index blocks are made up of a Key which exists in the file and -points to that specific location in the corresponding RFile data block. Thus, -the size of the RFile index blocks is largely dominated by the size of the Keys -which are used by the index. [ACCUMULO-4314][ACCUMULO-4314] is an improvement -which uses statistics on the length of the Keys in the Rfile to avoid choosing -Keys for the index whose length is greater than three standard deviations for -the RFile. By choosing smaller Keys for the index, Accumulo can access the -RFile index faster and keep more Index blocks cached in memory. Initial tests -showed that with this change, the RFile index size was nearly cut in half. - -### Gson version bump - -Due to an [upstream bug with Gson 2.2.2][GSONBUG], we've bumped our bundled -dependency ([ACCUMULO-4345][ACCUMULO-4345]) to version 2.2.4. Please take note -of this when you upgrade, if you were using the version shipped with Accumulo, -and were relying on the buggy behavior in the previous version in your own -code. - -### Minor performance improvements. - -A performance issue was identified and corrected -([ACCUMULO-1755][ACCUMULO-1755]) where the BatchWriter would block calls to -addMutation while looking up destination tablet server metadata. The writer has -been fixed to allow both operations in parallel. - - -## Other Notable Changes - - * [ACCUMULO-4155][ACCUMULO-4155] No longer publish javadoc for non-public API - to website. (Still available in javadoc jars in maven) - * [ACCUMULO-4334][ACCUMULO-4334] Ingest rates reported through JMX did not - match rates reported by Monitor. - * [ACCUMULO-4335][ACCUMULO-4335] Error conditions that result in a Halt should - ensure non-zero process exit code. - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop Datanode processes, and, in -HDFS High-Availability instances, forcing NameNode failover. - -{: #release_notes_testing .table } -| OS/Environment | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|----------------|--------|-------|-----------|---------|----------------------------------| -| CentOS 7 | 1.2.1 | 1 | 3.3.6 | No | Unit tests and Integration Tests | -| CentOS 7 | 2.2.0 | 1 | 3.3.6 | No | Unit tests and Integration Tests | - -[JIRA_166]: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312121&version=12334846 - -[GSONBUG]: https://github.com/google/gson/issues/362 - -[ACCUMULO-1755]: https://issues.apache.org/jira/browse/ACCUMULO-1755 -[ACCUMULO-4004]: https://issues.apache.org/jira/browse/ACCUMULO-4004 -[ACCUMULO-4146]: https://issues.apache.org/jira/browse/ACCUMULO-4146 -[ACCUMULO-4148]: https://issues.apache.org/jira/browse/ACCUMULO-4148 -[ACCUMULO-4155]: https://issues.apache.org/jira/browse/ACCUMULO-4155 -[ACCUMULO-4157]: https://issues.apache.org/jira/browse/ACCUMULO-4157 -[ACCUMULO-4164]: https://issues.apache.org/jira/browse/ACCUMULO-4164 -[ACCUMULO-4314]: https://issues.apache.org/jira/browse/ACCUMULO-4314 -[ACCUMULO-4334]: https://issues.apache.org/jira/browse/ACCUMULO-4334 -[ACCUMULO-4335]: https://issues.apache.org/jira/browse/ACCUMULO-4335 -[ACCUMULO-4341]: https://issues.apache.org/jira/browse/ACCUMULO-4341 -[ACCUMULO-4345]: https://issues.apache.org/jira/browse/ACCUMULO-4345 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.7.0.md ---------------------------------------------------------------------- diff --git a/release_notes/1.7.0.md b/release_notes/1.7.0.md deleted file mode 100644 index 5d452ef..0000000 --- a/release_notes/1.7.0.md +++ /dev/null @@ -1,398 +0,0 @@ ---- -title: Apache Accumulo 1.7.0 Release Notes ---- - -Apache Accumulo 1.7.0 is a significant release that includes many important -milestone features which expand the functionality of Accumulo. These include -features related to security, availability, and extensibility. Nearly 700 JIRA -issues were resolved in this version. Approximately two-thirds were bugs and -one-third were improvements. - -In the context of Accumulo's [Semantic Versioning][semver] [guidelines][api], -this is a "minor version". This means that new APIs have been created, some -deprecations may have been added, but no deprecated APIs have been removed. -Code written against 1.6.x should work against 1.7.0, likely binary-compatible -but definitely source-compatible. As always, the Accumulo developers take API compatibility -very seriously and have invested much time to ensure that we meet the promises set forth to our users. - -# Major Changes # - -## Updated Minimum Requirements ## - -Apache Accumulo 1.7.0 comes with an updated set of minimum requirements. - - * Java7 is required. Java6 support is dropped. - * Hadoop 2.2.0 or greater is required. Hadoop 1.x support is dropped. - * ZooKeeper 3.4.x or greater is required. - -## Client Authentication with Kerberos ## - -Kerberos is the de-facto means to provide strong authentication across Hadoop -and other related components. Kerberos requires a centralized key distribution -center to authentication users who have credentials provided by an -administrator. When Hadoop is configured for use with Kerberos, all users must -provide Kerberos credentials to interact with the filesystem, launch YARN -jobs, or even view certain web pages. - -While Accumulo has long supported operating on Kerberos-enabled HDFS, it still -required Accumulo users to use password-based authentication to authenticate -with Accumulo. [ACCUMULO-2815][ACCUMULO-2815] added support for allowing -Accumulo clients to use the same Kerberos credentials to authenticate to -Accumulo that they would use to authenticate to other Hadoop components, -instead of a separate user name and password just for Accumulo. - -This authentication leverages [Simple Authentication and Security Layer -(SASL)][SASL] and [GSSAPI][GSSAPI] to support Kerberos authentication over the -existing Apache Thrift-based RPC infrastructure that Accumulo employs. - -These additions represent a significant forward step for Accumulo, bringing -its client-authentication up to speed with the rest of the Hadoop ecosystem. -This results in a much more cohesive authentication story for Accumulo that -resonates with the battle-tested cell-level security and authorization model -already familiar to Accumulo users. - -More information on configuration, administration, and application of Kerberos -client authentication can be found in the [Kerberos chapter][kerberos] of the -Accumulo User Manual. - -## Data-Center Replication ## - -In previous releases, Accumulo only operated within the constraints of a -single installation. Because single instances of Accumulo often consist of -many nodes and Accumulo's design scales (near) linearly across many nodes, it -is typical that one Accumulo is run per physical installation or data-center. -[ACCUMULO-378][ACCUMULO-378] introduces support in Accumulo to automatically -copy data from one Accumulo instance to another. - -This data-center replication feature is primarily applicable to users wishing -to implement a disaster recovery strategy. Data can be automatically copied -from a primary instance to one or more other Accumulo instances. In contrast -to normal Accumulo operation, in which ingest and query are strongly -consistent, data-center replication is a lazy, eventually consistent -operation. This is desirable for replication, as it prevents additional -latency for ingest operations on the primary instance. Additionally, the -implementation of this feature can sustain prolonged outages between the -primary instance and replicas without any administrative overhead. - -The Accumulo User Manual contains a [new chapter on replication][replication] -which details the design and implementation of the feature, explains how users -can configure replication, and describes special cases to consider when -choosing to integrate the feature into a user application. - -## User-Initiated Compaction Strategies ## - -Per-table compaction strategies were added in 1.6.0 to provide custom logic to -decide which files are involved in a major compaction. In 1.7.0, the ability -to specify a compaction strategy for a user-initiated compaction was added in -[ACCUMULO-1798][ACCUMULO-1798]. This allows surgical compactions on a subset -of tablet files. Previously, a user-initiated compaction would compact all -files in a tablet. - -In the Java API, this new feature can be accessed in the following way: - - Connection conn = ... - CompactionStrategyConfig csConfig = new CompactionStrategyConfig(strategyClassName).setOptions(strategyOpts); - CompactionConfig compactionConfig = new CompactionConfig().setCompactionStrategy(csConfig); - connector.tableOperations().compact(tableName, compactionConfig) - -In [ACCUMULO-3134][ACCUMULO-3134], the shell's `compact` command was modified -to enable selecting which files to compact based on size, name, and path. -Options were also added to the shell's compaction command to allow setting -RFile options for the compaction output. Setting the output options could be -useful for testing. For example, one tablet to be compacted using snappy -compression. - -The following is an example shell command that compacts all files less than -10MB, if the tablet has at least two files that meet this criteria. If a -tablet had a 100MB, 50MB, 7MB, and 5MB file then the 7MB and 5MB files would -be compacted. If a tablet had a 100MB and 5MB file, then nothing would be done -because there are not at least two files meeting the selection criteria. - - compact -t foo --min-files 2 --sf-lt-esize 10M - -The following is an example shell command that compacts all bulk imported -files in a table. - - compact -t foo --sf-ename I.* - -These provided convenience options to select files execute using a specialized -compaction strategy. Options were also added to the shell to specify an -arbitrary compaction strategy. The option to specify an arbitrry compaction -strategy is mutually exclusive with the file selection and file creation -options, since those options are unique to the specialized compaction strategy -provided. See `compact --help` in the shell for the available options. - -## API Clarification ## - -The declared API in 1.6.x was incomplete. Some important classes like -ColumnVisibility were not declared as Accumulo API. Significant work was done -under [ACCUMULO-3657][ACCUMULO-3657] to correct the API statement and clean up -the API to be representative of all classes which users are intended to -interact with. The expanded and simplified API statement is in the -[README][api]. - -In some places in the API, non-API types were used. Ideally, public API -members would only use public API types. A tool called [APILyzer][apilyzer] -was created to find all API members that used non-API types. Many of the -violations found by this tool were deprecated to clearly communicate that a -non-API type was used. One example is a public API method that returned a -class called `KeyExtent`. `KeyExtent` was never intended to be in the public -API because it contains code related to Accumulo internals. `KeyExtent` and -the API methods returning it have since been deprecated. These were replaced -with a new class for identifying tablets that does not expose internals. -Deprecating a type like this from the API makes the API more stable while also -making it easier for contributors to change Accumulo internals without -impacting the API. - -The changes in [ACCUMULO-3657][ACCUMULO-3657] also included an Accumulo API -regular expression for use with checkstyle. Starting with 1.7.0, projects -building on Accumulo can use this checkstyle rule to ensure they are only -using Accumulo's public API. The regular expression can be found in the -[README][api]. - -# Performance Improvements # - -## Configurable Threadpool Size for Assignments ## - -During start-up, the Master quickly assigns tablets to Tablet Servers. However, -Tablet Servers load those assigned tablets one at a time. In 1.7, the servers -will be more aggressive, and will load tablets in parallel, so long as they do -not have mutations that need to be recovered. - -[ACCUMULO-1085] allows the size of the threadpool used in the Tablet Servers -for assignment processing to be configurable. - -## Group-Commit Threshold as a Factor of Data Size ## - -When ingesting data into Accumulo, the majority of time is spent in the -write-ahead log. As such, this is a common place that optimizations are added. -One optimization is known as "group-commit". When multiple clients are -writing data to the same Accumulo tablet, it is not efficient for each of them -to synchronize the WAL, flush their updates to disk for durability, and then -release the lock. The idea of group-commit is that multiple writers can queue -the write for their mutations to the WAL and then wait for a sync that will -satisfy the durability constraints of their batch of updates. This has a -drastic improvement on performance, since many threads writing batches -concurrently can "share" the same `fsync`. - -In previous versions, Accumulo controlled the frequency in which this -group-commit sync was performed as a factor of the number of clients writing -to Accumulo. This was both confusing to correctly configure and also -encouraged sub-par performance with few write threads. -[ACCUMULO-1950][ACCUMULO-1950] introduced a new configuration property -`tserver.total.mutation.queue.max` which defines the amount of data that is -queued before a group-commit is performed in such a way that is agnostic of -the number of writers. This new configuration property is much easier to -reason about than the previous (now deprecated) `tserver.mutation.queue.max`. -Users who have set `tserver.mutation.queue.max` in the past are encouraged -to start using the new `tserver.total.mutation.queue.max` property. - -# Other improvements # - -## Balancing Groups of Tablets ## - -By default, Accumulo evenly spreads each table's tablets across a cluster. In -some situations, it is advantageous for query or ingest to evenly spreads -groups of tablets within a table. For [ACCUMULO-3439][ACCUMULO-3439], a new -balancer was added to evenly spread groups of tablets to optimize performance. -This [blog post][group_balancer] provides more details about when and why -users may desire to leverage this feature.. - -## User-specified Durability ## - -Accumulo constantly tries to balance durability with performance. Guaranteeing -durability of every write to Accumulo is very difficult in a -massively-concurrent environment that requires high throughput. One common -area of focus is the write-ahead log, since it must eventually call `fsync` on -the local filesystem to guarantee that data written is durable in the face of -unexpected power failures. In some cases where durability can be sacrificed, -either due to the nature of the data itself or redundant power supplies, -ingest performance improvements can be attained. - -Prior to 1.7, a user could only configure the level of durability for -individual tables. With the implementation of [ACCUMULO-1957][ACCUMULO-1957], -the durability can be specified by the user when creating a `BatchWriter`, -giving users control over durability at the level of the individual writes. -Every `Mutation` written using that `BatchWriter` will be written with the -provided durability. This can result in substantially faster ingest rates when -the durability can be relaxed. - -## waitForBalance API ## - -When creating a new Accumulo table, the next step is typically adding splits -to that table before starting ingest. This can be extremely important since a -table without any splits will only be hosted on a single tablet server and -create a ingest bottleneck until the table begins to naturally split. Adding -many splits before ingesting will ensure that a table is distributed across -many servers and result in high throughput when ingest first starts. - -Adding splits to a table has long been a synchronous operation, but the -assignment of those splits was asynchronous. A large number of splits could be -processed, but it was not guaranteed that they would be evenly distributed -resulting in the same problem as having an insufficient number of splits. -[ACCUMULO-2998][ACCUMULO-2998] adds a new method to `InstanceOperations` which -allows users to wait for all tablets to be balanced. This method lets users -wait until tablets are appropriately distributed so that ingest can be run at -full-bore immediately. - -## Hadoop Metrics2 Support ## - -Accumulo has long had its own metrics system implemented using Java MBeans. -This enabled metrics to be reported by Accumulo services, but consumption by -other systems often required use of an additional tool like jmxtrans to read -the metrics from the MBeans and send them to some other system. - -[ACCUMULO-1817][ACCUMULO-1817] replaces this custom metrics system Accumulo -with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of -which is invalidating the need for an additional process to send metrics to -common metrics storage and visualization tools. With Metrics2 support, -Accumulo can send its metrics to common tools like Ganglia and Graphite. - -For more information on enabling Hadoop Metrics2, see the [Metrics -Chapter][metrics] in the Accumulo User Manual. - -## Distributed Tracing with HTrace ## - -HTrace has recently started gaining traction as a standalone project, -especially with its adoption in HDFS. Accumulo has long had distributed -tracing support via its own "Cloudtrace" library, but this wasn't intended for -use outside of Accumulo. - -[ACCUMULO-898][ACCUMULO-898] replaces Accumulo's Cloudtrace code with HTrace. -This has the benefit of adding timings (spans) from HDFS into Accumulo spans -automatically. - -Users who inspect traces via the Accumulo Monitor (or another system) will begin -to see timings from HDFS during operations like Major and Minor compactions when -running with at least Apache Hadoop 2.6.0. - -## VERSIONS file present in binary distribution ## - -In the pre-built binary distribution or distributions built by users from the -official source release, users will now see a `VERSIONS` file present in the -`lib/` directory alongside the Accumulo server-side jars. Because the created -tarball strips off versions from the jar file names, it can require extra work -to actually find what the version of each dependent jar (typically inspecting -the jar's manifest). - -[ACCUMULO-2863][ACCUMULO-2863] adds a `VERSIONS` file to the `lib/` directory -which contains the Maven groupId, artifactId, and verison (GAV) information for -each jar file included in the distribution. - -## Per-Table Volume Chooser ## - -The `VolumeChooser` interface is a server-side extension point that allows user -tables to provide custom logic in choosing where its files are written when -multiple HDFS instances are available. By default, a randomized volume chooser -implementation is used to evenly balance files across all HDFS instances. - -Previously, this VolumeChooser logic was instance-wide which meant that it would -affect all tables. This is potentially undesirable as it might unintentionally -impact other users in a multi-tenant system. [ACCUMULO-3177][ACCUMULO-3177] -introduces a new per-table property which supports configuration of a -`VolumeChooser`. This ensures that the implementation to choose how HDFS -utilization happens when multiple are available is limited to the expected -subset of all tables. - -## Table and namespace custom properties ## - -In order to avoid errors caused by mis-typed configuration properties, Accumulo was strict about which configuration properties -could be set. However, this prevented users from setting arbitrary properties that could be used by custom balancers, compaction -strategies, volume choosers, and iterators. Under [ACCUMULO-2841][ACCUMULO-2841], the ability to set arbitrary table and -namespace properties was added. The properties need to be prefixed with `table.custom.`. The changes made in -[ACCUMULO-3177][ACCUMULO-3177] and [ACCUMULO-3439][ACCUMULO-3439] leverage this new feature. - -# Notable Bug Fixes # - -## SourceSwitchingIterator Deadlock ## - -An instance of SourceSwitchingIterator, the Accumulo iterator which -transparently manages whether data for a tablet read from memory (the -in-memory map) or disk (HDFS after a minor compaction), was found deadlocked -in a production system. - -This deadlock prevented the scan and the minor compaction from ever -successfully completing without restarting the tablet server. -[ACCUMULO-3745][ACCUMULO-3745] fixes the inconsistent synchronization inside -of the SourceSwitchingIterator to prevent this deadlock from happening in the -future. - -The only mitigation of this bug was to restart the tablet server that is -deadlocked. - -## Table flush blocked indefinitely ## - -While running the Accumulo RandomWalk distributed test, it was observed that -all activity in Accumulo had stopped and there was an offline Accumulo -metadata table tablet. The system first tried to flush a user tablet, but the -metadata table was not online (likely due to the agitation process which stops -and starts Accumulo processes during the test). After this call, a call to -load the metadata tablet was queued but could not complete until the previous -flush call. Thus, a deadlock occurred. - -This deadlock happened because the synchronous flush call could not complete -before the load tablet call completed, but the load tablet call couldn't run -because of connection caching we perform in Accumulo's RPC layer to reduce the -quantity of sockets we need to create to send data. -[ACCUMULO-3597][ACCUMULO-3597] prevents this deadlock by forcing the use of a -non-cached connection for the RPC message requesting a metadata tablet to be -loaded. - -While this feature does result in additional network resources to be used, the -concern is minimal because the number of metadata tablets is typically very -small with respect to the total number of tablets in the system. - -The only mitigation of this bug was to restart the tablet server that is hung. - -# Testing # - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop DataNode processes, and, in -HDFS High-Availability instances, forcing NameNode fail-over. - -During testing, multiple Accumulo developers noticed some stability issues -with HDFS using Apache Hadoop 2.6.0 when restarting Accumulo processes and -HDFS datanodes. The developers investigated these issues as a part of the -normal release testing procedures, but were unable to find a definitive cause -of these failures. Users are encouraged to follow -[ACCUMULO-2388][ACCUMULO-2388] if they wish to follow any future developments. -One possible workaround is to increase the `general.rpc.timeout` in the -Accumulo configuration from `120s` to `240s`. - -{: #release_notes_testing .table } -| OS | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|--------------|--------|----------------|-----------|---------|---------------------------------------------------------------------------| -| Gentoo | N/A | 1 | N/A | No | Unit and Integration Tests | -| Gentoo | 2.6.0 | 1 (2 TServers) | 3.4.5 | No | 24hr CI w/ agitation and verification, 24hr RW w/o agitation. | -| Centos 6.6 | 2.6.0 | 3 | 3.4.6 | No | 24hr RW w/ agitation, 24hr CI w/o agitation, 72hr CI w/ and w/o agitation | -| Amazon Linux | 2.6.0 | 20 m1large | 3.4.6 | No | 24hr CI w/o agitation | - -[ACCUMULO-378]: https://issues.apache.org/jira/browse/ACCUMULO-378 -[ACCUMULO-898]: https://issues.apache.org/jira/browse/ACCUMULO-898 -[ACCUMULO-1085]: https://issues.apache.org/jira/browse/ACCUMULO-1085 -[ACCUMULO-1798]: https://issues.apache.org/jira/browse/ACCUMULO-1798 -[ACCUMULO-1817]: https://issues.apache.org/jira/browse/ACCUMULO-1817 -[ACCUMULO-1950]: https://issues.apache.org/jira/browse/ACCUMULO-1950 -[ACCUMULO-1957]: https://issues.apache.org/jira/browse/ACCUMULO-1957 -[ACCUMULO-2388]: https://issues.apache.org/jira/browse/ACCUMULO-2388 -[ACCUMULO-2815]: https://issues.apache.org/jira/browse/ACCUMULO-2815 -[ACCUMULO-2841]: https://issues.apache.org/jira/browse/ACCUMULO-2841 -[ACCUMULO-2863]: https://issues.apache.org/jira/browse/ACCUMULO-2863 -[ACCUMULO-2998]: https://issues.apache.org/jira/browse/ACCUMULO-2998 -[ACCUMULO-3134]: https://issues.apache.org/jira/browse/ACCUMULO-3134 -[ACCUMULO-3177]: https://issues.apache.org/jira/browse/ACCUMULO-3177 -[ACCUMULO-3439]: https://issues.apache.org/jira/browse/ACCUMULO-3439 -[ACCUMULO-3597]: https://issues.apache.org/jira/browse/ACCUMULO-3597 -[ACCUMULO-3657]: https://issues.apache.org/jira/browse/ACCUMULO-3657 -[ACCUMULO-3745]: https://issues.apache.org/jira/browse/ACCUMULO-3745 -[GSSAPI]: https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface -[SASL]: https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer -[api]: https://github.com/apache/accumulo/blob/1.7.0/README.md#api -[apilyzer]: http://code.revelc.net/apilyzer-maven-plugin -[group_balancer]: https://blogs.apache.org/accumulo/entry/balancing_groups_of_tablets -[kerberos]: {{ site.baseurl }}/1.7/accumulo_user_manual#_kerberos -[metrics]: {{ site.baseurl }}/1.7/accumulo_user_manual#_metrics -[readme]: https://github.com/apache/accumulo/blob/1.7.0/README.md -[replication]: {{ site.baseurl }}/1.7/accumulo_user_manual#_replication -[semver]: http://semver.org http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.7.1.md ---------------------------------------------------------------------- diff --git a/release_notes/1.7.1.md b/release_notes/1.7.1.md deleted file mode 100644 index f6d3901..0000000 --- a/release_notes/1.7.1.md +++ /dev/null @@ -1,149 +0,0 @@ ---- -title: Apache Accumulo 1.7.1 Release Notes ---- - -Apache Accumulo 1.7.1 is a maintenance release on the 1.7 version branch. This -release contains changes from more than 150 issues, comprised of bug-fixes, -performance improvements, build quality improvements, and more. See -[JIRA][JIRA_171] for a complete list. - -Users of any previous 1.7.x release are strongly encouraged to update as soon -as possible to benefit from the improvements with very little concern in change -of underlying functionality. Users of 1.6 or earlier that are seeking to -upgrade to 1.7 should consider 1.7.1 as a starting point. - -## Highlights - -### Silent data-loss via bulk imported files - -A user recently reported that a simple bulk-import application would -occasionally lose some records. Through investigation, it was found that when -bulk imports into a table failed the initial assignment, the logic that -automatically retries the imports was incorrectly choosing the tablets to -import the files into. [ACCUMULO-3967][ACCUMULO-3967] contains more information -on the cause and identification of the bug. The data-loss condition would only -affect entire files. If records from a file exist in Accumulo, it is still -guaranteed that all records within that imported file were successful. - -As such, users who have bulk import applications using previous versions of -Accumulo should verify that all of their data was correctly ingested into -Accumulo and immediately update to Accumulo 1.7.1 (This is the same bug that -was fixed in 1.6.4, so you won't be affected if you're running 1.6.4 or newer). - -### Queued Compactions Not Running - -Found and fixed a bug ([ACCUMULO-4016][ACCUMULO-4016]) in which some queued -compactions would never run if the number of files changed while the tablet was -queued. - -### Kerberos Ticket Renewals - -A bug was fixed which caused Accumulo clients and services to fail to check and -(if necessary) renew their Kerberos credentials. This would eventually lead to -these components failing to properly authenticate until they were restarted. -([ACCUMULO-4069][ACCUMULO-4069]) - -### Updated commons-collection - -The bundled commons-collection library was updated from version 3.2.1 to 3.2.2 -because of a reported vulnerability in that library. -([ACCUMULO-4056][ACCUMULO-4056]) - -### Faster Processing of Conditional Mutations - -Improved ConditionalMutation processing time by a factor of 3. -([ACCUMULO-4066][ACCUMULO-4066]) - -### Slow GC While Bulk Importing - -Found and worked around an issue where lots of bulk imports creating many new -files would significantly impair the Accumulo GC service, and possibly prevent -it from running to completion entirely. ([ACCUMULO-4021][ACCUMULO-4021]) - -### Unnoticed Per-table Configuration Updates - -Fixed a bug which caused tablet servers to not notice changes to the per-table -constraints, under some circumstances. ([ACCUMULO-3859][ACCUMULO-3859]) - -### TabletServers kill themselves on CentOS7 - -Reduced the aggressiveness with which Accumulo Tablet Servers preemptively -killed themselves when a local filesystem switched to read-only (indicating a -possible failure). To reduce false positives, such as those which can occur -with systemd's extra cgroup mounts in CentOS7, an additional check was added to -ensure that tablet servers would only kill themselves if an ext- or -xfs-formatted disk switched to read-only. ([ACCUMULO-4080][ACCUMULO-4080]) - -### Improvements in Locating Client Configuration File - -Fixed some unexpected error messages related to setting -ACCUMULO_CLIENT_CONF_PATH, and improved the detection of the client.conf file if -ACCUMULO_CLIENT_CONF_PATH was set to a directory containing client.conf. -([ACCUMULO-4026][ACCUMULO-4026],[ACCUMULO-4027][ACCUMULO-4027]) - -### Transient ZooKeeper disconnect causes FATE threads to exit - -ZooKeeper clients are expected to handle the situation where they become -disconnected from the ZooKeeper server and must wait to be reconnected -before continuing ZooKeeper operations. - -The dedicated threads running inside the Accumulo Master process for FATE -actions had the potential unexpectedly exit in this disconnected state. -This caused a scenario where all future FATE-based operations would -be blocked until the Accumulo Master process was restarted. ([ACCUMULO-4060][ACCUMULO-4060]) - -### Incorrect management of certain Apache Thrift RPCs - -Accumulo relies on Apache Thrift to implement remote procedure calls between -Accumulo services. Accumulo's use of Thrift uncovered an unfortunate situation -where a special RPC (a "oneway" call) would leave unwanted data on the underlying -Thrift connection. After this extra data was left on connection, all subsequent RPCs -re-using that connection would fail with "out of sequence response" error messages. -Accumulo would be left in a bad state until the mishandled connections were released -or Accumulo services were restarted. ([ACCUMULO-4065][ACCUMULO-4065]) - -## Other Notable Changes - - * [ACCUMULO-3509][ACCUMULO-3509] Fixed some lock contention in TabletServer, preventing resource cleanup - * [ACCUMULO-3734][ACCUMULO-3734] Fixed quote-escaping bug in VisibilityConstraint - * [ACCUMULO-4025][ACCUMULO-4025] Fixed cleanup of bulk load fate transactions - * [ACCUMULO-4098][ACCUMULO-4098],[ACCUMULO-4113][ACCUMULO-4113] Fixed widespread misuse of ByteBuffer - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop Datanode processes, and, in -HDFS High-Availability instances, forcing NameNode failover. - -{: #release_notes_testing .table } -| OS/Environment | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|---------------------------------------------------------------------------|--------|-------|-----------|---------|--------------------------------------------------------------------------------------------------------------------------------------| -| CentOS 7.1 w/Oracle JDK8 on EC2 (1 m3.xlarge, 8 d2.xlarge) | 2.6.3 | 9 | 3.4.6 | No | Random walk (All.xml) 24-hour run, saw [ACCUMULO-3794][ACCUMULO-3794] and [ACCUMULO-4151][ACCUMULO-4151]. | -| CentOS 7.1 w/Oracle JDK8 on EC2 (1 m3.xlarge, 8 d2.xlarge) | 2.6.3 | 9 | 3.4.6 | No | 21 hr run of CI w/ agitation, 23.1B entries verified. | -| CentOS 7.1 w/Oracle JDK8 on EC2 (1 m3.xlarge, 8 d2.xlarge) | 2.6.3 | 9 | 3.4.6 | No | 24 hr run of CI w/o agitation, 23.0B entries verified; saw performance issues outlined in comment on [ACCUMULO-4146][ACCUMULO-4146]. | -| CentOS 6.7 (OpenJDK 7), Fedora 23 (OpenJDK 8), and CentOS 7.2 (OpenJDK 7) | 2.6.1 | 1 | 3.4.6 | No | All unit tests and ITs pass with -Dhadoop.version=2.6.1; Kerberos ITs had a problem with earlier versions of Hadoop | - -[JIRA_171]: https://issues.apache.org/jira/browse/ACCUMULO/fixforversion/12329940 - -[ACCUMULO-3509]: https://issues.apache.org/jira/browse/ACCUMULO-3509 -[ACCUMULO-3734]: https://issues.apache.org/jira/browse/ACCUMULO-3734 -[ACCUMULO-3794]: https://issues.apache.org/jira/browse/ACCUMULO-3794 -[ACCUMULO-3859]: https://issues.apache.org/jira/browse/ACCUMULO-3859 -[ACCUMULO-3967]: https://issues.apache.org/jira/browse/ACCUMULO-3967 -[ACCUMULO-4016]: https://issues.apache.org/jira/browse/ACCUMULO-4016 -[ACCUMULO-4021]: https://issues.apache.org/jira/browse/ACCUMULO-4021 -[ACCUMULO-4025]: https://issues.apache.org/jira/browse/ACCUMULO-4025 -[ACCUMULO-4026]: https://issues.apache.org/jira/browse/ACCUMULO-4026 -[ACCUMULO-4027]: https://issues.apache.org/jira/browse/ACCUMULO-4027 -[ACCUMULO-4056]: https://issues.apache.org/jira/browse/ACCUMULO-4056 -[ACCUMULO-4060]: https://issues.apache.org/jira/browse/ACCUMULO-4060 -[ACCUMULO-4065]: https://issues.apache.org/jira/browse/ACCUMULO-4065 -[ACCUMULO-4066]: https://issues.apache.org/jira/browse/ACCUMULO-4066 -[ACCUMULO-4069]: https://issues.apache.org/jira/browse/ACCUMULO-4069 -[ACCUMULO-4080]: https://issues.apache.org/jira/browse/ACCUMULO-4080 -[ACCUMULO-4098]: https://issues.apache.org/jira/browse/ACCUMULO-4098 -[ACCUMULO-4113]: https://issues.apache.org/jira/browse/ACCUMULO-4113 -[ACCUMULO-4146]: https://issues.apache.org/jira/browse/ACCUMULO-4146 -[ACCUMULO-4151]: https://issues.apache.org/jira/browse/ACCUMULO-4151 - http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.7.2.md ---------------------------------------------------------------------- diff --git a/release_notes/1.7.2.md b/release_notes/1.7.2.md deleted file mode 100644 index 283053b..0000000 --- a/release_notes/1.7.2.md +++ /dev/null @@ -1,87 +0,0 @@ ---- -title: Apache Accumulo 1.7.2 Release Notes ---- - -Apache Accumulo 1.7.2 is a maintenance release on the 1.7 version branch. This -release contains changes from more than 150 issues, comprised of bug-fixes, -performance improvements, build quality improvements, and more. See -[JIRA][JIRA_172] for a complete list. - -Users of any previous 1.7.x release are strongly encouraged to update as soon -as possible to benefit from the improvements with very little concern in change -of underlying functionality. Users of 1.6 or earlier that are seeking to -upgrade to 1.7 should consider 1.7.2 as a starting point. - -## Highlights - -### Write-Ahead Logs can be prematurely deleted - -There were cases where the Accumulo Garbage Collector may inadvertently delete a WAL for a tablet server that it has erroneously determined to be down, causing data loss. This has been corrected. See [ACCUMULO-4157][ACCUMULO-4157] for additional detail. - -### Upgrade to Commons-VFS 2.1 - -Upgrading to Apache Commons VFS 2.1 fixes several issues with classloading out of HDFS. For further detail see [ACCUMULO-4146][ACCUMULO-4146]. Additional fixes to a potential HDFS class loading deadlock situation were made in [ACCUMULO-4341][ACCUMULO-4341]. - -### Native Map failed to increment mutation count properly - -There was a bug ([ACCUMULO-4148][ACCUMULO-4148]) where multiple put calls with identical keys and no timestamp would exhibit different behaviour depending on whether native maps were enabled or not. This behaviour would result in hidden mutations with native maps, and has been corrected. - -### Open WAL files could prevent DataNode decomission - -An improvement was introduced to allow a max age before WAL files would be automatically rolled. Without a max age, they could stay open for writing indefinitely, blocking the Hadoop DataNode decomissioning process. For more information, see [ACCUMULO-4004][ACCUMULO-4004]. - -### Remove unnecessary copy of cached RFile index blocks - -Accumulo maintains an cache for file blocks in-memory as a performance optimization. This can be done safely because Accumulo RFiles are immutable, thus their blocks are also immutable. There are two types of these blocks: index and data blocks. Index blocks refer to the b-tree style index inside of each Accumulo RFile, while data blocks contain the sorted Key-Value pairs. In previous versions, when Accumulo extracted an Index block from the in-memory cache, it would copy the data. [ACCUMULO-4164][ACCUMULO-4164] removes this unnecessary copy as the contents are immutable and can be passed by reference. Ensuring that the Index blocks are not copied when accessed from the cache is a big performance gain at the file-access level. - -### Analyze Key-length to avoid choosing large Keys for RFile Index blocks - -Accumulo's RFile index blocks are made up of a Key which exists in the file and points to that specific location in the corresponding RFile data block. Thus, the size of the RFile index blocks is largely dominated by the size of the Keys which are used by the index. [ACCUMULO-4314][ACCUMULO-4314] is an improvement which uses statistics on the length of the Keys in the Rfile to avoid choosing Keys for the index whose length is greater than three standard deviations for the RFile. By choosing smaller Keys for the index, Accumulo can access the RFile index faster and keep more Index blocks cached in memory. Initial tests showed that with this change, the RFile index size was nearly cut in half. - -### Minor performance improvements. - -Tablet servers would previously always hsync at the start of a minor compaction, causing delays in the write pipeline. These additional syncs were determined to provide no additional durability guarantees and have been removed. See [ACCUMULO-4112][ACCUMULO-4112] for additional detail. - -A performance issue was identified and corrected ([ACCUMULO-1755][ACCUMULO-1755]) where the BatchWriter would block calls to addMutation while looking up destination tablet server metadata. The writer has been fixed to allow both operations in parallel. - - -## Other Notable Changes - - * [ACCUMULO-3923][ACCUMULO-3923] bootstrap_hdfs.sh script would copy incorrect jars to hdfs. - * [ACCUMULO-4146][ACCUMULO-4146] Avoid copy of RFile Index Blocks when already in cache. - * [ACCUMULO-4155][ACCUMULO-4155] No longer publish javadoc for non-public API to website. (Still available in javadoc jars in maven) - * [ACCUMULO-4173][ACCUMULO-4173] Provide balancer to balance table within subset of hosts. - * [ACCUMULO-4334][ACCUMULO-4334] Ingest rates reported through JMX did not match rates reported by Monitor. - * [ACCUMULO-4335][ACCUMULO-4335] Error conditions that result in a Halt should ensure non-zero process exit code. - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop Datanode processes, and, in -HDFS High-Availability instances, forcing NameNode failover. - -{: #release_notes_testing .table } -| OS/Environment | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|--------------------------------------------|--------|-------|-----------|---------|------------------------------------------------------| -| CentOS 7; EC2 m3.xlarge, d2.xlarge workers | 2.6.3 | 9 | 3.4.8 | No | 24 HR Continuous Ingest with and without Agitation. | -| CentOS 6: EC2 m3.2xlarge | 2.6.1 | 1 | 3.4.5 | No | Unit tests and Integration Tests | - -[JIRA_172]: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312121&version=12333776 - -[ACCUMULO-4157]: https://issues.apache.org/jira/browse/ACCUMULO-4157 -[ACCUMULO-4146]: https://issues.apache.org/jira/browse/ACCUMULO-4146 -[ACCUMULO-4341]: https://issues.apache.org/jira/browse/ACCUMULO-4341 -[ACCUMULO-4148]: https://issues.apache.org/jira/browse/ACCUMULO-4148 -[ACCUMULO-4004]: https://issues.apache.org/jira/browse/ACCUMULO-4004 -[ACCUMULO-4112]: https://issues.apache.org/jira/browse/ACCUMULO-4112 -[ACCUMULO-1755]: https://issues.apache.org/jira/browse/ACCUMULO-1755 -[ACCUMULO-4146]: https://issues.apache.org/jira/browse/ACCUMULO-4146 -[ACCUMULO-4335]: https://issues.apache.org/jira/browse/ACCUMULO-4335 -[ACCUMULO-4334]: https://issues.apache.org/jira/browse/ACCUMULO-4334 -[ACCUMULO-4314]: https://issues.apache.org/jira/browse/ACCUMULO-4314 -[ACCUMULO-3923]: https://issues.apache.org/jira/browse/ACCUMULO-3923 -[ACCUMULO-4155]: https://issues.apache.org/jira/browse/ACCUMULO-4155 -[ACCUMULO-4173]: https://issues.apache.org/jira/browse/ACCUMULO-4173 -[ACCUMULO-4151]: https://issues.apache.org/jira/browse/ACCUMULO-4151 -[ACCUMULO-4164]: https://issues.apache.org/jira/browse/ACCUMULO-4164 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/1.8.0.md ---------------------------------------------------------------------- diff --git a/release_notes/1.8.0.md b/release_notes/1.8.0.md deleted file mode 100644 index 868928b..0000000 --- a/release_notes/1.8.0.md +++ /dev/null @@ -1,189 +0,0 @@ ---- -title: Apache Accumulo 1.8.0 Release Notes ---- - -Apache Accumulo 1.8.0 is a significant release that includes many important -milestone features which expand the functionality of Accumulo. These include -features related to security, availability, and extensibility. Over -350 JIRA issues were resolved in this version. This includes over -200 bug fixes and 71 improvements and 4 new features. See -[JIRA][JIRA_180] for the complete list. - -In the context of Accumulo's [Semantic Versioning][semver] [guidelines][api], -this is a "minor version". This means that new APIs have been created, some -deprecations may have been added, but no deprecated APIs have been removed. -Code written against 1.7.x should work against 1.8.0 -- binary compatibility -has been preserved with one exception of an already-deprecated Mock Accumulo -utility class. As always, the Accumulo developers take API compatibility -very seriously and have invested much time to ensure that we meet the promises set forth to our users. - -## Major Changes - -### Speed up WAL roll overs - -Performance of writing mutations is improved by refactoring the -bookeeping required for Write-Ahead Log (WAL) files and by creating a -standby WAL for faster switching when the log is full. This was a -substantial refactor in the way WALs worked, but smoothes overall -ingest performance in addition to provides a increase in write speed -as shown by the simple test below. The top entry is before -[ACCUMULO-3423][ACCUMULO-3423] and the bottom graph is after the -refactor. - -![Graph of WAL speed up after ACCUMULO-3423][IMG-3423] - -### User level API for RFile - -Previously the only public API available to write RFiles was via the AccumuloFileOutputFormat. There was no way to read RFiles in the public -API. [ACCUMULO-4165][ACCUMULO-4165] exposes a brand new public [API][RFILE_API] for reading and writing RFiles as well as cleans up some of the internal APIs. - -### Suspend Tablet assignment for rolling restarts - -When a tablet server dies, Accumulo attempted to reassign the tablets as quickly as possible to maintain availability. -A new configuration property `table.suspend.duration` (with a default of zero seconds) now controls how long to wait before reassigning -a tablet from a dead tserver. The property is configurable via the -Accumulo shell, so you can set it, do a rolling restart, and then -set it back to 0. A new state as introduced, TableState.SUSPENDED to support this feature. By default, metadata tablet -reassignment is not suspended, but that can also be changed with the `master.metadata.suspendable` property that is false by -default. Root tablet assignment can not be suspended. See [ACCUMULO-4353] for more info. - -### Run multiple Tablet Servers on one node - -[ACCUMULO-4328] introduces the capability of running multiple tservers on a single node. This is intended for nodes with a large -amounts of memory and/or disk. This feature is disabled by default. There are several related tickets: [ACCUMULO-4072], [ACCUMULO-4331] -and [ACCUMULO-4406]. Note that when this is enabled, the names of the log files change. Previous log file names were defined in the -generic_logger.xml as `${org.apache.accumulo.core.application}_{org.apache.accumulo.core.ip.localhost.hostname}.log`. -The files will now include the instance id after the application with -`${org.apache.accumulo.core.application}_${instance}_${org.apache.accumulo.core.ip.localhost.hostname}.log`. - -For example: tserver_host.domain.com.log will become tserver_1_host.domain.log when multiple TabletServers -are run per host. The same change also applies to the debug logs provided in the example configurations. The log -names do not change if this feature is not used. - -### Rate limiting Major Compactions - -Major Compactions can significantly increase the amount of load on -TabletServers. [ACCUMULO-4187] restricts the rate at which data is -read and written when performing major compactions. This has a direct -effect on the IO load caused by major compactions with a similar -effect on the CPU utilization. This behavior is controlled by a new -property `tserver.compaction.major.throughput` with a defaults of 0B -which disables the rate limiting. - -### Table Sampling - -Queryable sample data was added by [ACCUMULO-3913]. This allows users to configure a pluggable -function to generate sample data. At scan time, the sample data can optionally be scanned. -Iterators also have access to sample data. Iterators can access all data and sample data, this -allows an iterator to use sample data for query optimizations. The new user level RFile API -supports writing RFiles with sample data for bulk import. - -A simple configurable sampler function is included with Accumulo. This sampler uses hashing and -can be configured to use a subset of Key fields. For example if it was desired to have entire rows -in the sample, then this sampler would be configured to hash+mod the row. Then when a row is -selected for the sample, all of its columns and all of its updates will be in the sample data. -Another scenario is one in which a document id is in the column qualifier. In this scenario, one -would either want all data related to a document in the sample data or none. To achieve this, the -sample could be configured to hash+mod on the column qualifier. See the sample [Readme -example][sample] and javadocs on the new APIs for more information. - -For sampling to work, all tablets scanned must have pre-generated sample data that was generated in -the same way. If this is not the case then scans will fail. For existing tables, samples can be -generated by configuring sampling on the table and compacting the table. - -### Upgrade to Apache Thrift 0.9.3 - -Accumulo relies on Apache Thrift to implement remote procedure calls -between Accumulo services. Ticket [ACCUMULO-4077][ACCUMULO-4077] -updates our dependency to 0.9.3. See the -[Apache Thrift 0.9.3 Release Notes][THRIFT-0.9.3-RN] for details on -the changes to Thrift. **NOTE:** The Thrift 0.9.3 Java library is not -compatible other versions of Thrift. Applications running against Accumulo -1.8 must use Thrift 0.9.3. Different versions of Thrift on the classpath -will not work. - -### Iterator Test Harness - -Users often write a new iterator without fully understanding its limits and lifetime. Previously, Accumulo did -not provide any means in which a user could test iterators to catch common issues that only become apparent -in multi-node production deployments. [ACCUMULO-626] provides a framework and a collection of initial tests -which can be used to simulate common issues with Iterators that only appear in production deployments. This test -harness can be used directly by users as a supplemental tool to unit tests and integration tests with MiniAccumuloCluster. - -Please see the [Accumulo User Manual chapter on Iterator Testing][ITER_TEST] for more information - -### Default port for Monitor changed to 9995 - -Previously, the default port for the monitor was 50095. You will need to update your links to point to port 9995. The default -port for the GC process was also changed from 50091 to 9998, although this an RPC port used internally and automatically discovered. -These default ports were changed because the previous defaults fell in the Linux Ephemeral port range. This means that the operating -system, when a port in this range was unusued, would allocate this port for dynamic network communication. This has the side-effect of -temporal bind issues when trying to start these services (as the operating -system might have allocated them elsewhere). By moving these -defaults out of the ephemeral range, we can guarantee that the Monitor and GC -will reliably start. These values are still configurable by setting -`monitor.port.client`and `gc.port.client` in the accumulo-site.xml. - - -## Other Notable Changes - - * [ACCUMULO-1055] Configurable maximum file size for merging minor compactions - * [ACCUMULO-1124] Optimization of RFile index - * [ACCUMULO-2883] API to fetch current tablet assignments - * [ACCUMULO-3871] Support for running integration tests in MapReduce - * [ACCUMULO-3920] Deprecate the MockAccumulo class and remove usage in our tests - * [ACCUMULO-4339] Make hadoop-minicluster optional dependency of acccumulo-minicluster - * [ACCUMULO-4318] BatchWriter, ConditionalWriter, and ScannerBase now extend AutoCloseable - * [ACCUMULO-4326] Value constructor now accepts Strings (and Charsequences) - * [ACCUMULO-4354] Bump dependency versions to include gson, jetty, and sl4j - * [ACCUMULO-3735] Bulk Import status page on the monitor - * [ACCUMULO-4066] Reduced time to processes conditional mutations. - * [ACCUMULO-4164] Reduced seek time for cached data. - -## Testing - -Each unit and functional test only runs on a single node, while the RandomWalk -and Continuous Ingest tests run on any number of nodes. *Agitation* refers to -randomly restarting Accumulo processes and Hadoop Datanode processes, and, in -HDFS High-Availability instances, forcing NameNode failover. - -{: #release_notes_testing .table } -| OS/Environment | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests | -|----------------------------------------------------------------------------|----------------------|-------|------------------|---------|----------------------------------------------| -| CentOS7/openJDK7/EC2; 3 m3.xlarge leaders, 8 d2.xlarge workers | 2.6.4 | 11 | 3.4.8 | No | 24 HR Continuous Ingest without Agitation. | -| CentOS7/openJDK7/EC2; 3 m3.xlarge leaders, 8 d2.xlarge workers | 2.6.4 | 11 | 3.4.8 | No | 16 HR Continuous Ingest with Agitation. | -| CentOS7/openJDK7/OpenStack VMs (16G RAM 2cores 2disk3; 1 leader, 5 workers | HDP 2.5 (Hadoop 2.7) | 7 | HDP 2.5 (ZK 3.4) | No | 24 HR Continuous Ingest without Agitation. | -| CentOS7/openJDK7/OpenStack VMs (16G RAM 2cores 2disk3; 1 leader, 5 workers | HDP 2.5 (Hadoop 2.7) | 7 | HDP 2.5 (ZK 3.4) | No | 24 HR Continuous Ingest with Agitation. | - -[ACCUMULO-1055]: https://issues.apache.org/jira/browse/ACCUMULO-1055 -[ACCUMULO-1124]: https://issues.apache.org/jira/browse/ACCUMULO-1124 -[ACCUMULO-2883]: https://issues.apache.org/jira/browse/ACCUMULO-2883 -[ACCUMULO-3409]: https://issues.apache.org/jira/browse/ACCUMULO-3409 -[ACCUMULO-3423]: https://issues.apache.org/jira/browse/ACCUMULO-3423 -[ACCUMULO-3735]: https://issues.apache.org/jira/browse/ACCUMULO-3735 -[ACCUMULO-3871]: https://issues.apache.org/jira/browse/ACCUMULO-3871 -[ACCUMULO-3913]: https://issues.apache.org/jira/browse/ACCUMULO-3913 -[ACCUMULO-3920]: https://issues.apache.org/jira/browse/ACCUMULO-3920 -[ACCUMULO-4072]: https://issues.apache.org/jira/browse/ACCUMULO-4072 -[ACCUMULO-4077]: https://issues.apache.org/jira/browse/ACCUMULO-4077 -[ACCUMULO-4066]: https://issues.apache.org/jira/browse/ACCUMULO-4066 -[ACCUMULO-4164]: https://issues.apache.org/jira/browse/ACCUMULO-4164 -[ACCUMULO-4165]: https://issues.apache.org/jira/browse/ACCUMULO-4165 -[ACCUMULO-4187]: https://issues.apache.org/jira/browse/ACCUMULO-4187 -[ACCUMULO-4318]: https://issues.apache.org/jira/browse/ACCUMULO-4318 -[ACCUMULO-4326]: https://issues.apache.org/jira/browse/ACCUMULO-4326 -[ACCUMULO-4328]: https://issues.apache.org/jira/browse/ACCUMULO-4328 -[ACCUMULO-4331]: https://issues.apache.org/jira/browse/ACCUMULO-4331 -[ACCUMULO-4339]: https://issues.apache.org/jira/browse/ACCUMULO-4339 -[ACCUMULO-4353]: https://issues.apache.org/jira/browse/ACCUMULO-4353 -[ACCUMULO-4354]: https://issues.apache.org/jira/browse/ACCUMULO-4354 -[ACCUMULO-4406]: https://issues.apache.org/jira/browse/ACCUMULO-4406 -[ACCUMULO-626]: https://issues.apache.org/jira/browse/ACCUMULO-626 -[IMG-3423]: https://issues.apache.org/jira/secure/attachment/12705402/WAL-slowdown-graphs.jpg "Graph of WAL speed up after ACCUMULO-3423" -[JIRA_180]: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312121&version=12329879 -[THRIFT-0.9.3-RN]: https://github.com/apache/thrift/blob/0.9.3/CHANGES -[api]: https://github.com/apache/accumulo/blob/1.8/README.md#api -[semver]: http://semver.org -[sample]: ../1.8/examples/sample -[ITER_TEST]: ../1.8/accumulo_user_manual.html#_iterator_testing -[RFILE_API]: ../1.8/apidocs/org/apache/accumulo/core/client/rfile/RFile.html http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/release_notes/index.md ---------------------------------------------------------------------- diff --git a/release_notes/index.md b/release_notes/index.md deleted file mode 100644 index 2b3f2aa..0000000 --- a/release_notes/index.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -title: Release Notes ---- - -Apache Accumulo generates release notes for the benefit of users which summarize the important and notable changes that -are contained in that release. Users can leverage these release notes to make make decisions about the benefits (or -concerns) in updating to a new version. - -Each release notes document aim to provide content broken down into the following categories of interest. - -* New Features -* Performance Improvements -* Bug Fixes -* Reference to special upgrade instructions (if any) -* Security Concerns -* API changes outside of standard guarantees -* Community-driven release testing - -## Archives of all release notes - -### 1.7 releases - -* [Apache Accumulo 1.7.2][REL_172] -* [Apache Accumulo 1.7.1][REL_171] -* [Apache Accumulo 1.7.0][REL_170] - -### 1.6 releases - -* [Apache Accumulo 1.6.6][REL_166] -* [Apache Accumulo 1.6.5][REL_165] -* [Apache Accumulo 1.6.4][REL_164] -* [Apache Accumulo 1.6.3][REL_163] -* [Apache Accumulo 1.6.2][REL_162] -* [Apache Accumulo 1.6.1][REL_161] -* [Apache Accumulo 1.6.0][REL_160] - -### 1.5 releases - -* [Apache Accumulo 1.5.4][REL_154] -* [Apache Accumulo 1.5.3][REL_153] -* [Apache Accumulo 1.5.2][REL_152] -* [Apache Accumulo 1.5.1][REL_151] - -[REL_151]: 1.5.1 -[REL_152]: 1.5.2 -[REL_153]: 1.5.3 -[REL_154]: 1.5.4 -[REL_160]: 1.6.0 -[REL_161]: 1.6.1 -[REL_162]: 1.6.2 -[REL_163]: 1.6.3 -[REL_164]: 1.6.4 -[REL_165]: 1.6.5 -[REL_166]: 1.6.6 -[REL_170]: 1.7.0 -[REL_171]: 1.7.1 -[REL_172]: 1.7.2 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/9a50bd13/user_manual_1.3-incubating/Accumulo_Design.md ---------------------------------------------------------------------- diff --git a/user_manual_1.3-incubating/Accumulo_Design.md b/user_manual_1.3-incubating/Accumulo_Design.md deleted file mode 100644 index 0e74f8f..0000000 --- a/user_manual_1.3-incubating/Accumulo_Design.md +++ /dev/null @@ -1,104 +0,0 @@ ---- -title: "User Manual: Accumulo Design" ---- - -** Next:** [Accumulo Shell][2] ** Up:** [Apache Accumulo User Manual Version 1.3][4] ** Previous:** [Introduction][6] ** [Contents][8]** - -**Subsections** - -* [Data Model][9] -* [Architecture][10] -* [Components][11] -* [Data Management][12] -* [Tablet Service][13] -* [Compactions][14] -* [Fault-Tolerance][15] - -* * * - -## Accumulo Design - -## Data Model - -Accumulo provides a richer data model than simple key-value stores, but is not a fully relational database. Data is represented as key-value pairs, where the key and value are comprised of the following elements: - -![converted table][16] - -All elements of the Key and the Value are represented as byte arrays except for Timestamp, which is a Long. Accumulo sorts keys by element and lexicographically in ascending order. Timestamps are sorted in descending order so that later versions of the same Key appear first in a sequential scan. Tables consist of a set of sorted key-value pairs. - -## Architecture - -Accumulo is a distributed data storage and retrieval system and as such consists of several architectural components, some of which run on many individual servers. Much of the work Accumulo does involves maintaining certain properties of the data, such as organization, availability, and integrity, across many commodity-class machines. - -## Components - -An instance of Accumulo includes many TabletServers, write-ahead Logger servers, one Garbage Collector process, one Master server and many Clients. - -### Tablet Server - -The TabletServer manages some subset of all the tablets (partitions of tables). This includes receiving writes from clients, persisting writes to a write‐ahead log, sorting new key‐value pairs in memory, periodically flushing sorted key‐value pairs to new files in HDFS, and responding to reads from clients, forming a merge‐sorted view of all keys and values from all the files it has created and the sorted in‐memory store. - -TabletServers also perform recovery of a tablet that was previously on a server that failed, reapplying any writes found in the write-ahead log to the tablet. - -### Loggers - -The Loggers accept updates to Tablet servers and write them to local on-disk storage. Each tablet server will write their updates to multiple loggers to preserve data in case of hardware failure. - -### Garbage Collector - -Accumulo processes will share files stored in HDFS. Periodically, the Garbage Collector will identify files that are no longer needed by any process, and delete them. - -### Master - -The Accumulo Master is responsible for detecting and responding to TabletServer failure. It tries to balance the load across TabletServer by assigning tablets carefully and instructing TabletServers to migrate tablets when necessary. The Master ensures all tablets are assigned to one TabletServer each, and handles table creation, alteration, and deletion requests from clients. The Master also coordinates startup, graceful shutdown and recovery of changes in write-ahead logs when Tablet servers fail. - -### Client - -Accumulo includes a client library that is linked to every application. The client library contains logic for finding servers managing a particular tablet, and communicating with TabletServers to write and retrieve key-value pairs. - -## Data Management - -Accumulo stores data in tables, which are partitioned into tablets. Tablets are partitioned on row boundaries so that all of the columns and values for a particular row are found together within the same tablet. The Master assigns Tablets to one TabletServer at a time. This enables row-level transactions to take place without using distributed locking or some other complicated synchronization mechanism. As clients insert and query data, and as machines are added and removed from the cluster, the Master migrates tablets to ensure they remain available and that the ingest and query load is balanced across the cluster. - -![Image data_distribution][17] - -## Tablet Service - -When a write arrives at a TabletServer it is written to a Write‐Ahead Log and then inserted into a sorted data structure in memory called a MemTable. When the MemTable reaches a certain size the TabletServer writes out the sorted key-value pairs to a file in HDFS called Indexed Sequential Access Method (ISAM) file. This process is called a minor compaction. A new MemTable is then created and the fact of the compaction is recorded in the Write‐Ahead Log. - -When a request to read data arrives at a TabletServer, the TabletServer does a binary search across the MemTable as well as the in-memory indexes associated with each ISAM file to find the relevant values. If clients are performing a scan, several key‐value pairs are returned to the client in order from the MemTable and the set of ISAM files by performing a merge‐sort as they are read. - -## Compactions - -In order to manage the number of files per tablet, periodically the TabletServer performs Major Compactions of files within a tablet, in which some set of ISAM files are combined into one file. The previous files will eventually be removed by the Garbage Collector. This also provides an opportunity to permanently remove deleted key‐value pairs by omitting key‐value pairs suppressed by a delete entry when the new file is created. - -## Fault-Tolerance - -If a TabletServer fails, the Master detects it and automatically reassigns the tablets assigned from the failed server to other servers. Any key-value pairs that were in memory at the time the TabletServer are automatically reapplied from the Write-Ahead Log to prevent any loss of data. - -The Master will coordinate the copying of write-ahead logs to HDFS so the logs are available to all tablet servers. To make recovery efficient, the updates within a log are grouped by tablet. The sorting process can be performed by Hadoops MapReduce or the Logger server. TabletServers can quickly apply the mutations from the sorted logs that are destined for the tablets they have now been assigned. - -TabletServer failures are noted on the Master's monitor page, accessible via -http://master-address:50095/monitor. - -![Image failure_handling][18] - -* * * - -** Next:** [Accumulo Shell][2] ** Up:** [Apache Accumulo User Manual Version 1.3][4] ** Previous:** [Introduction][6] ** [Contents][8]** - -[2]: Accumulo_Shell.html -[4]: accumulo_user_manual.html -[6]: Introduction.html -[8]: Contents.html -[9]: Accumulo_Design.html#Data_Model -[10]: Accumulo_Design.html#Architecture -[11]: Accumulo_Design.html#Components -[12]: Accumulo_Design.html#Data_Management -[13]: Accumulo_Design.html#Tablet_Service -[14]: Accumulo_Design.html#Compactions -[15]: Accumulo_Design.html#Fault-Tolerance -[16]: img1.png -[17]: ./data_distribution.png -[18]: ./failure_handling.png -