accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r951904 - in /websites/staging/accumulo/trunk/content: ./ release_notes/1.7.0.html
Date Tue, 19 May 2015 16:45:02 GMT
Author: buildbot
Date: Tue May 19 16:45:01 2015
New Revision: 951904

Log:
Staging update by buildbot for accumulo

Modified:
    websites/staging/accumulo/trunk/content/   (props changed)
    websites/staging/accumulo/trunk/content/release_notes/1.7.0.html

Propchange: websites/staging/accumulo/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue May 19 16:45:01 2015
@@ -1 +1 @@
-1680114
+1680341

Modified: websites/staging/accumulo/trunk/content/release_notes/1.7.0.html
==============================================================================
--- websites/staging/accumulo/trunk/content/release_notes/1.7.0.html (original)
+++ websites/staging/accumulo/trunk/content/release_notes/1.7.0.html Tue May 19 16:45:01 2015
@@ -211,9 +211,16 @@ Latest 1.5 release: <strong>1.5.2</stron
 
     <h1 class="title">Apache Accumulo 1.7.0 Release Notes</h1>
 
-    <p>Apache Accumulo 1.7.0 is a release that needs to be described</p>
-<h1 id="draft-draft-draft-draft-draft-draft">DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT</h1>
-<h1 id="notable-improvements">Notable Improvements</h1>
+    <h1 id="draft-draft-draft-draft-draft-draft">DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT</h1>
+<p>Apache Accumulo 1.7.0 is a major release which includes a number of important milestone
features
+that expand on the functionality of Accumulo. These features range from security to availability
+to extendability.</p>
+<p>In the context of Accumulo's Semantic Versioning guidelines, this is a "minor version"
which means
+that new APIs have been created, but no deprecated APIs have been removed. Code written against
+1.6.x should work against 1.7.0, possibly with a re-compilation. As always, the Accumulo
+developers take API compatibility very seriously and have invested much time in ensuring
that
+we meet the promises set forward to our users.</p>
+<h2 id="major-changes">Major Changes</h2>
 <h3 id="client-authentication-with-kerberos">Client Authentication with Kerberos</h3>
 <p>Kerberos is far and away the de-facto means to provide strong authentication across
Hadoop
 and other related components. Kerberos requires a centralized key distribution center
@@ -302,37 +309,14 @@ Accumulo internals w/o impacting the API
 <p>Created an Accumulo API regular expression for use with checkstyle. Starting with
1.7.0, projects building on Accumulo can use 
 this checkstyle rule to ensure they are only using Accumulo's public API. The regular expression
can be found in the 
 <a href="https://github.com/apache/accumulo/blob/8cba8128fbc3238bdd9398cf5c36b7cb6dc3b61d/README.md">README</a>.</p>
-<h2 id="notable-bug-fixes">Notable Bug Fixes</h2>
-<h3 id="sourceswitchingiterator-deadlock">SourceSwitchingIterator Deadlock</h3>
-<p>An instance of SourceSwitchingIterator, the Accumulo iterator which transparently
-manages whether data for a Tablet is in memory (the in-memory map) or disk (HDFS 
-after a minor compaction), was found deadlocked in a production system.</p>
-<p>This deadlock prevented the scan and the minor compaction from ever successfully
-completing without restarting the TabletServer. <a href="https://issues.apache.org/jira/browse/ACCUMULO-3745">ACCUMULO-3745</a>
-fixes the inconsistent synchronization inside of the SourceSwitchingIterator
-to prevent this deadlock from happening in the future.</p>
-<h3 id="table-flush-blocked-indefinitely">Table flush blocked indefinitely</h3>
-<p>While running the Accumulo Randomwalk distributed test, it was observed
-that all activity in Accumulo had stopped and there was an offline
-Accumulo metadata table tablet. The system first tried to flush a user
-tablet but the metadata table was not online (likely due to the agitation
-process which stops and starts Accumulo processes during the test). After
-this call, a call to load the metadata tablet was queued but could not 
-complete until the previous flush call. Thus, a deadlock occurred.</p>
-<p>This deadlock happened because the synchronous flush call could not complete
-before the load tablet call completed, but the load tablet call couldn't
-run because of connection caching we perform in Accumulo's RPC layer
-to reduce the quantity of sockets we need to create to send data. 
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-3597">ACCUMULO-3597</a>
prevents this dealock by forcing a
-non-cached connection for the message requesting loads of metadata tablets,
-we can ensure that this deadlock won't occur.</p>
-<h2 id="performance-improvements">Performance Improvements</h2>
-<h3 id="performance-improvement-1">Performance Improvement 1</h3>
-<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.
-Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure
- dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat
- non proident, sunt in culpa qui officia deserunt mollit anim id est laborum</p>
-<h2 id="general-improvements">General improvements</h2>
+<h2 id="updated-minimum-versions">Updated Minimum Versions</h2>
+<p>Apache Accumulo 1.7.0 comes with an updated set of minimum dependencies.</p>
+<ul>
+<li>Java7 is required. Java6 support is dropped.</li>
+<li>Hadoop 1 support is dropped, at least Hadoop 2.2.0 is required</li>
+<li>ZooKeeper 3.4.x or greater is required.</li>
+</ul>
+<h2 id="other-improvements">Other improvements</h2>
 <h3 id="balancing-groups-of-tablets">Balancing Groups of Tablets</h3>
 <p>By default Accumulo evenly spreads each tables tablets across a cluster.  In some

 situations its advantageous for query or ingest to evenly spreads groups of tablets 
@@ -371,11 +355,60 @@ via its own "Cloudtrace" library, but wa
 <p><a href="https://issues.apache.org/jira/browse/ACCUMULO-898">ACCUMULO-898</a>
replaces Accumulo's Cloudtrace code with HTrace. This
 has the benefit of timings (spans) already in Accumulo automatically containing
 additional information from the HDFS operations.</p>
-<h2 id="documentation">Documentation</h2>
-<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.
-Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure
- dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat
- non proident, sunt in culpa qui officia deserunt mollit anim id est laborum</p>
+<h2 id="performance-improvements">Performance Improvements</h2>
+<h3 id="configurable-threadpool-size-for-assignments">Configurable Threadpool Size
for Assignments</h3>
+<p>One of the primary tasks that the Accumulo Master is responsible for is the
+assignment of Tablets to TabletServers. Before a TabletServer can be brought online,
+the tablet must not have any outstanding logs as this represents a need to perform
+recovery (the tablet was not unloaded cleanly). This process can take some time for
+large write-ahead log files and is performed on a TabletServer to keep the Master
+light and agile.</p>
+<p>Assignments, whether the Tablets need to perform recovery or not, share the same
+threadpool in the Master. This means that when a large number of TabletServers are
+available, too few threads dedicated to assignment can restrict the speed at which
+assignments can be performed. <a href="https://issues.apache.org/jira/browse/ACCUMULO-1085">ACCUMULO-1085</a>
allows the size of the
+threadpool used in the Master for assignments to be configurable which can be
+dynamically altered to remove the artificial limitation when sufficient servers are available.</p>
+<h3 id="group-commit-threshold-as-a-factor-of-data-size">Group-Commit Threshold as
a Factor of Data Size</h3>
+<p>When ingesting data into Accumulo, the majority of time is spent in the write-ahead
+log. As such, this is a common place that optimizations are added. One optimization
+is the notion of "group-commit". When multiple clients are writing data to the same
+Accumulo Tablet, it is not efficient for each of them to synchronize the WAL, flush their
+updates to disk for durability, and then release the lock. The idea of group-commit
+is that multiple writers can queue their write their mutations to the WAL and perform
+then wait for a sync that could satisfy the durability constraints of multiple clients
+instead of just one. This has a drastic improvement on performance.</p>
+<p>In previous versions, Accumulo controlled the frequency in which this group-commit
+sync was performed as a factor of clients writing to Accumulo. This was both confusing
+to correctly configure and also encouraged sub-par performance with fewer writers.
+<a href="https://issues.apache.org/jira/browse/ACCUMULO-1950">ACCUMULO-1950</a>
introduced a new configuration property <code>tserver.total.mutation.queue.max</code>
+which defines the amount of data that is queued before a group-commit is performed
+in such a way that is agnostic of the number of writers. This new configuration property
+is much easier to reason about than the previous, now deprecated, <code>tserver.mutation.queue.max</code>.</p>
+<h2 id="notable-bug-fixes">Notable Bug Fixes</h2>
+<h3 id="sourceswitchingiterator-deadlock">SourceSwitchingIterator Deadlock</h3>
+<p>An instance of SourceSwitchingIterator, the Accumulo iterator which transparently
+manages whether data for a Tablet is in memory (the in-memory map) or disk (HDFS 
+after a minor compaction), was found deadlocked in a production system.</p>
+<p>This deadlock prevented the scan and the minor compaction from ever successfully
+completing without restarting the TabletServer. <a href="https://issues.apache.org/jira/browse/ACCUMULO-3745">ACCUMULO-3745</a>
+fixes the inconsistent synchronization inside of the SourceSwitchingIterator
+to prevent this deadlock from happening in the future.</p>
+<h3 id="table-flush-blocked-indefinitely">Table flush blocked indefinitely</h3>
+<p>While running the Accumulo Randomwalk distributed test, it was observed
+that all activity in Accumulo had stopped and there was an offline
+Accumulo metadata table tablet. The system first tried to flush a user
+tablet but the metadata table was not online (likely due to the agitation
+process which stops and starts Accumulo processes during the test). After
+this call, a call to load the metadata tablet was queued but could not 
+complete until the previous flush call. Thus, a deadlock occurred.</p>
+<p>This deadlock happened because the synchronous flush call could not complete
+before the load tablet call completed, but the load tablet call couldn't
+run because of connection caching we perform in Accumulo's RPC layer
+to reduce the quantity of sockets we need to create to send data. 
+<a href="https://issues.apache.org/jira/browse/ACCUMULO-3597">ACCUMULO-3597</a>
prevents this dealock by forcing a
+non-cached connection for the message requesting loads of metadata tablets,
+we can ensure that this deadlock won't occur.</p>
 <h2 id="testing">Testing</h2>
 <p>Each unit and functional test only runs on a single node, while the RandomWalk and
Continuous Ingest tests run 
 on any number of nodes. <em>Agitation</em> refers to randomly restarting Accumulo
processes and Hadoop DataNode processes,



Mime
View raw message