accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r922883 - in /websites/staging/accumulo/trunk/content: ./ release_notes/1.5.2.html
Date Fri, 19 Sep 2014 21:10:54 GMT
Author: buildbot
Date: Fri Sep 19 21:10:53 2014
New Revision: 922883

Staging update by buildbot for accumulo

    websites/staging/accumulo/trunk/content/   (props changed)

Propchange: websites/staging/accumulo/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Fri Sep 19 21:10:53 2014
@@ -1 +1 @@

Modified: websites/staging/accumulo/trunk/content/release_notes/1.5.2.html
--- websites/staging/accumulo/trunk/content/release_notes/1.5.2.html (original)
+++ websites/staging/accumulo/trunk/content/release_notes/1.5.2.html Fri Sep 19 21:10:53 2014
@@ -204,16 +204,48 @@ to benefit from the improvements.</p>
 to the 1.5 line as development has already shifted towards the 1.6 line. For those
 who cannot or do not want to upgrade to 1.6, 1.5.2 is still an excellent choice
 over earlier versions in the 1.5 line.</p>
-<h2 id="notable-improvements">Notable Improvements</h2>
-<p>While new features are typically not added in a bug-fix release as 1.5.2, the
-community does create a variety of improvements that are API compatible. Contained
-here are some of the more notable improvements.</p>
-<h3 id="performance-improvements">Performance improvements</h3>
+<h2 id="performance-improvements">Performance Improvements</h2>
+<p>Apache Accumulo 1.5.2 includes a number of performance-related fixes over previous
+<h3 id="write-ahead-log-sync-performance">Write-Ahead Log sync performance</h3>
 <p>The Write-Ahead Log (WAL) files are used to ensure durability of updates made to
 A "sync" is called on the file in HDFS to make sure that the changes to the WAL are persisted
 to disk, which allows Accumulo to recover in the case of failure. <a href="">ACCUMULO-2766</a>
 an issue where an operation against a WAL would unnecessarily wait for multiple syncs, slowing
 down the ingest on the system.</p>
+<h3 id="minor-compactions-not-aggressive-enough">Minor-Compactions not aggressive enough</h3>
+<p>On a system with ample memory provided to Accumulo, long hold-times were observed
+blocks the ingest of new updates. Trying to free more server-side memory by running minor
+compactions more frequently increased the overall throughput on the node. These changes
+were made in <a href="">ACCUMULO-2905</a>.</p>
+<h3 id="heapiterator-optimization">HeapIterator optimization</h3>
+<p>Iterators, a notable feature of Accumulo, are provided to users as a server-side
+construct, but are also used internally for numerous server operations. One of these system
+is the HeapIterator which implements a PriorityQueue of other Iterators. One way this iterator
+used is to merge multiple files in HDFS to present a single, sorted stream of Key-Value pairs.
<a href="">ACCUMULO-2827</a>
+introduces a performance optimization to the HeapIterator which can improve the speed of
+HeapIterator in common cases.</p>
+<h3 id="write-ahead-log-sync-implementation">Write-Ahead log sync implementation</h3>
+<p>In Hadoop-2, two implementation of "sync" are provider: hflush and hsync. Both of
+methods provide a way to request that the datanodes write the data to the underlying
+medium and not just hold it in memory (the 'fsync' syscall). While both of these methods
+inform the Datanodes to sync the relevant block(s), hflush does not wait for acknowledgement
+from the Datanodes that the sync finished, where hsync does. To provide the most reliable
+"out of the box", Accumulo defaults to hsync so that your data is as secure as possible in

+a variety of situations (notably, unexpected power outages).</p>
+<p>The downside is that performance tends to suffer because waiting for a sync to disk
is a very
+expensive operation. <a href="">ACCUMULO-2842</a>
introduces a new system property, tserver.wal.sync.method,
+that lets users to change the HDFS sync implementation from 'hsync' to 'hflush'. Using 'hflush'
+of 'hsync' should result in about a 30% increase in ingest performance.</p>
+<p>For users upgrading from Hadoop-1 or Hadoop-0.20 releases, "hflush" is the equivalent
of how
+sync was implemented and should give equivalent performance.</p>
+<h3 id="server-side-mutation-queue-size">Server-side mutation queue size</h3>
+<p>When users desire writes to be as durable as possible, using 'hsync', the ingest
+of the system can be improved by increasing the tserver.mutation.queue.max property. The
+of this change is that it will cause TabletServers to use additional memory per writer. In
+the value of this parameter defaulted to a conservative 256K, which resulted in sub-par ingest
+<p>1.5.2 and <a href="">ACCUMULO-3018</a>
increases this buffer to 1M which has a noticeable impact on
+ingest performance with a minimal increase in TabletServer memory usage.</p>
 <h2 id="notable-bug-fixes">Notable Bug Fixes</h2>
 <h3 id="fixes-mapreduce-package-name-change">Fixes MapReduce package name change</h3>
 <p>1.5.1 inadvertently included a change to RangeInputSplit which created an incompatibility
@@ -240,6 +272,11 @@ never returns. Most of these are related
 <p>The Writable interface methods on the RangeInputSplit class accidentally omitted
 calls to serialize the IteratorSettings configured for the Job. <a href="">ACCUMULO-2962</a>
 fixes the serialization and adds some additional tests.</p>
+<h3 id="constraint-violation-causes-hung-scans">Constraint violation causes hung scans</h3>
+<p>A failed bulk import transaction had the ability to create an infinitely retrying
+loop due to a constraint violation. This directly prevents scans from completing,
+but will also hang compactions. <a href="">ACCUMULO-3096</a>
fixes the issue so that the
+constraint no longer hangs the entire system.</p>
 <h2 id="documentation">Documentation</h2>
 <p>The following documentation updates were made: </p>

View raw message