accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r955999 - in /websites/staging/accumulo/trunk/content: ./ release_notes/1.5.3.html
Date Thu, 25 Jun 2015 22:43:38 GMT
Author: buildbot
Date: Thu Jun 25 22:43:38 2015
New Revision: 955999

Log:
Staging update by buildbot for accumulo

Modified:
    websites/staging/accumulo/trunk/content/   (props changed)
    websites/staging/accumulo/trunk/content/release_notes/1.5.3.html

Propchange: websites/staging/accumulo/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Jun 25 22:43:38 2015
@@ -1 +1 @@
-1687659
+1687660

Modified: websites/staging/accumulo/trunk/content/release_notes/1.5.3.html
==============================================================================
--- websites/staging/accumulo/trunk/content/release_notes/1.5.3.html (original)
+++ websites/staging/accumulo/trunk/content/release_notes/1.5.3.html Thu Jun 25 22:43:38 2015
@@ -213,287 +213,17 @@ Latest 1.5 release: <strong>1.5.3</stron
 
     <h1 class="title">Apache Accumulo 1.7.0 Release Notes</h1>
 
-    <p>Apache Accumulo 1.7.0 is a significant release that includes many important
-milestone features which expand the functionality of Accumulo. These include
-features related to security, availability, and extensibility. Nearly 700 JIRA
-issues were resolved in this version. Approximately two-thirds were bugs and
-one-third were improvements.</p>
+    <p>Apache Accumulo 1.5.3 is a bug-fix release for the 1.5 series. It is likely
the last
+release in this line due to lack of user interest and the magnitude of improvements
+in newer release lines.</p>
 <p>In the context of Accumulo's <a href="http://semver.org">Semantic Versioning</a>
<a href="https://github.com/apache/accumulo/blob/1.7.0/README.md#api">guidelines</a>,
-this is a "minor version". This means that new APIs have been created, some
-deprecations may have been added, but no deprecated APIs have been removed.
-Code written against 1.6.x should work against 1.7.0, likely binary-compatible
-but definitely source-compatible. As always, the Accumulo developers take API compatibility
-very seriously and have invested much time to ensure that we meet the promises set forth
to our users.</p>
+this is a "patch version". This means that there have been no API changes. Any
+changes which were made were done in a backwards-compatible manner. Code that
+runs against 1.5.2 is guaranteed to run against 1.5.3.</p>
 <h1 id="major-changes">Major Changes</h1>
-<h2 id="updated-minimum-requirements">Updated Minimum Requirements</h2>
-<p>Apache Accumulo 1.7.0 comes with an updated set of minimum requirements.</p>
-<ul>
-<li>Java7 is required. Java6 support is dropped.</li>
-<li>Hadoop 2.2.0 or greater is required. Hadoop 1.x support is dropped.</li>
-<li>ZooKeeper 3.4.x or greater is required.</li>
-</ul>
-<h2 id="client-authentication-with-kerberos">Client Authentication with Kerberos</h2>
-<p>Kerberos is the de-facto means to provide strong authentication across Hadoop
-and other related components. Kerberos requires a centralized key distribution
-center to authentication users who have credentials provided by an
-administrator. When Hadoop is configured for use with Kerberos, all users must
-provide Kerberos credentials to interact with the filesystem, launch YARN
-jobs, or even view certain web pages.</p>
-<p>While Accumulo has long supported operating on Kerberos-enabled HDFS, it still
-required Accumulo users to use password-based authentication to authenticate
-with Accumulo. <a href="https://issues.apache.org/jira/browse/ACCUMULO-2815">ACCUMULO-2815</a>
added support for allowing
-Accumulo clients to use the same Kerberos credentials to authenticate to
-Accumulo that they would use to authenticate to other Hadoop components,
-instead of a separate user name and password just for Accumulo.</p>
-<p>This authentication leverages <a href="https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer">Simple
Authentication and Security Layer
-(SASL)</a> and <a href="https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface">GSSAPI</a>
to support Kerberos authentication over the
-existing Thrift-based RPC infrastructure that Accumulo employs.</p>
-<p>These additions represent a significant forward step for Accumulo, bringing
-its client-authentication up to speed with the rest of the Hadoop ecosystem.
-This results in a much more cohesive authentication story for Accumulo that
-resonates with the battle-tested cell-level security and authorization model
-already familiar to Accumulo users.</p>
-<p>More information on configuration, administration, and application of Kerberos
-client authentication can be found in the <a href="/1.7/accumulo_user_manual.html#_kerberos">Kerberos
chapter</a> of the
-Accumulo User Manual.</p>
-<h2 id="data-center-replication">Data-Center Replication</h2>
-<p>In previous releases, Accumulo only operated within the constraints of a
-single installation. Because single instances of Accumulo often consist of
-many nodes and Accumulo's design scales (near) linearly across many nodes, it
-is typical that one Accumulo is run per physical installation or data-center.
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-378">ACCUMULO-378</a>
introduces support in Accumulo to automatically
-copy data from one Accumulo instance to another.</p>
-<p>This data-center replication feature is primarily applicable to users wishing
-to implement a disaster recovery strategy. Data can be automatically copied
-from a primary instance to one or more other Accumulo instances. In contrast
-to normal Accumulo operation, in which ingest and query are strongly
-consistent, data-center replication is a lazy, eventually consistent
-operation. This is desirable for replication, as it prevents additional
-latency for ingest operations on the primary instance. Additionally, the
-implementation of this feature can sustain prolonged outages between the
-primary instance and replicas without any administrative overhead.</p>
-<p>The Accumulo User Manual contains a <a href="/1.7/accumulo_user_manual.html#_replication">new
chapter on replication</a>
-which details the design and implementation of the feature, explains how users
-can configure replication, and describes special cases to consider when
-choosing to integrate the feature into a user application.</p>
-<h2 id="user-initiated-compaction-strategies">User-Initiated Compaction Strategies</h2>
-<p>Per-table compaction strategies were added in 1.6.0 to provide custom logic to
-decide which files are involved in a major compaction. In 1.7.0, the ability
-to specify a compaction strategy for a user-initiated compaction was added in
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-1798">ACCUMULO-1798</a>.
This allows surgical compactions on a subset
-of tablet files. Previously, a user-initiated compaction would compact all
-files in a tablet.</p>
-<p>In the Java API, this new feature can be accessed in the following way:</p>
-<div class="codehilite"><pre><span class="n">Connection</span> <span
class="n">conn</span> <span class="p">=</span> <span class="p">...</span>
-<span class="n">CompactionStrategyConfig</span> <span class="n">csConfig</span>
<span class="p">=</span> <span class="n">new</span> <span class="n">CompactionStrategyConfig</span><span
class="p">(</span><span class="n">strategyClassName</span><span class="p">).</span><span
class="n">setOptions</span><span class="p">(</span><span class="n">strategyOpts</span><span
class="p">);</span>
-<span class="n">CompactionConfig</span> <span class="n">compactionConfig</span>
<span class="p">=</span> <span class="n">new</span> <span class="n">CompactionConfig</span><span
class="p">().</span><span class="n">setCompactionStrategy</span><span
class="p">(</span><span class="n">csConfig</span><span class="p">);</span>
-<span class="n">connector</span><span class="p">.</span><span
class="n">tableOperations</span><span class="p">().</span><span class="n">compact</span><span
class="p">(</span><span class="n">tableName</span><span class="p">,</span>
<span class="n">compactionConfig</span><span class="p">)</span>
-</pre></div>
-
-
-<p>In <a href="https://issues.apache.org/jira/browse/ACCUMULO-3134">ACCUMULO-3134</a>,
the shell's <code>compact</code> command was modified
-to enable selecting which files to compact based on size, name, and path.
-Options were also added to the shell's compaction command to allow setting
-RFile options for the compaction output. Setting the output options could be
-useful for testing. For example, one tablet to be compacted using snappy
-compression.</p>
-<p>The following is an example shell command that compacts all files less than
-10MB, if the tablet has at least two files that meet this criteria. If a
-tablet had a 100MB, 50MB, 7MB, and 5MB file then the 7MB and 5MB files would
-be compacted. If a tablet had a 100MB and 5MB file, then nothing would be done
-because there are not at least two files meeting the selection criteria.</p>
-<div class="codehilite"><pre><span class="n">compact</span> <span
class="o">-</span><span class="n">t</span> <span class="n">foo</span>
<span class="o">--</span><span class="n">min</span><span class="o">-</span><span
class="n">files</span> 2 <span class="o">--</span><span class="n">sf</span><span
class="o">-</span><span class="n">lt</span><span class="o">-</span><span
class="n">esize</span> 10<span class="n">M</span>
-</pre></div>
-
-
-<p>The following is an example shell command that compacts all bulk imported
-files in a table.</p>
-<div class="codehilite"><pre><span class="n">compact</span> <span
class="o">-</span><span class="n">t</span> <span class="n">foo</span>
<span class="o">--</span><span class="n">sf</span><span class="o">-</span><span
class="n">ename</span> <span class="n">I</span><span class="o">.*</span>
-</pre></div>
-
-
-<p>These provided convenience options to select files execute using a specialized
-compaction strategy. Options were also added to the shell to specify an
-arbitrary compaction strategy. The option to specify an arbitrry compaction
-strategy is mutually exclusive with the file selection and file creation
-options, since those options are unique to the specialized compaction strategy
-provided. See <code>compact --help</code> in the shell for the available options.</p>
-<h2 id="api-clarification">API Clarification</h2>
-<p>The declared API in 1.6.x was incomplete. Some important classes like
-ColumnVisibility were not declared as Accumulo API. Significant work was done
-under <a href="https://issues.apache.org/jira/browse/ACCUMULO-3657">ACCUMULO-3657</a>
to correct the API statement and clean up
-the API to be representative of all classes which users are intended to
-interact with. The expanded and simplified API statement is in the
-<a href="https://github.com/apache/accumulo/blob/1.7.0/README.md#api">README</a>.</p>
-<p>In some places in the API, non-API types were used. Ideally, public API
-members would only use public API types. A tool called <a href="http://code.revelc.net/apilyzer-maven-plugin/">APILyzer</a>
-was created to find all API members that used non-API types. Many of the
-violations found by this tool were deprecated to clearly communicate that a
-non-API type was used. One example is a public API method that returned a
-class called <code>KeyExtent</code>. <code>KeyExtent</code> was never
intended to be in the public
-API because it contains code related to Accumulo internals. <code>KeyExtent</code>
and
-the API methods returning it have since been deprecated. These were replaced
-with a new class for identifying tablets that does not expose internals.
-Deprecating a type like this from the API makes the API more stable while also
-making it easier for contributors to change Accumulo internals without
-impacting the API.</p>
-<p>The changes in <a href="https://issues.apache.org/jira/browse/ACCUMULO-3657">ACCUMULO-3657</a>
also included an Accumulo API
-regular expression for use with checkstyle. Starting with 1.7.0, projects
-building on Accumulo can use this checkstyle rule to ensure they are only
-using Accumulo's public API. The regular expression can be found in the
-<a href="https://github.com/apache/accumulo/blob/1.7.0/README.md#api">README</a>.</p>
 <h1 id="performance-improvements">Performance Improvements</h1>
-<h2 id="configurable-threadpool-size-for-assignments">Configurable Threadpool Size
for Assignments</h2>
-<p>During start-up, the Master quickly assigns tablets to Tablet Servers. However,
-Tablet Servers load those assigned tablets one at a time. In 1.7, the servers
-will be more aggressive, and will load tablets in parallel, so long as they do
-not have mutations that need to be recovered.</p>
-<p><a href="https://issues.apache.org/jira/browse/ACCUMULO-1085">ACCUMULO-1085</a>
allows the size of the threadpool used in the Tablet Servers 
-for assignment processing to be configurable.</p>
-<h2 id="group-commit-threshold-as-a-factor-of-data-size">Group-Commit Threshold as
a Factor of Data Size</h2>
-<p>When ingesting data into Accumulo, the majority of time is spent in the
-write-ahead log. As such, this is a common place that optimizations are added.
-One optimization is known as "group-commit". When multiple clients are
-writing data to the same Accumulo tablet, it is not efficient for each of them
-to synchronize the WAL, flush their updates to disk for durability, and then
-release the lock. The idea of group-commit is that multiple writers can queue
-the write for their mutations to the WAL and then wait for a sync that will
-satisfy the durability constraints of their batch of updates. This has a
-drastic improvement on performance, since many threads writing batches
-concurrently can "share" the same <code>fsync</code>.</p>
-<p>In previous versions, Accumulo controlled the frequency in which this
-group-commit sync was performed as a factor of the number of clients writing
-to Accumulo. This was both confusing to correctly configure and also
-encouraged sub-par performance with few write threads.
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-1950">ACCUMULO-1950</a>
introduced a new configuration property
-<code>tserver.total.mutation.queue.max</code> which defines the amount of data
that is
-queued before a group-commit is performed in such a way that is agnostic of
-the number of writers. This new configuration property is much easier to
-reason about than the previous (now deprecated) <code>tserver.mutation.queue.max</code>.
-Users who have set <code>tserver.mutation.queue.max</code> in the past are encouraged
-to start using the new <code>tserver.total.mutation.queue.max</code> property.</p>
 <h1 id="other-improvements">Other improvements</h1>
-<h2 id="balancing-groups-of-tablets">Balancing Groups of Tablets</h2>
-<p>By default, Accumulo evenly spreads each table's tablets across a cluster. In
-some situations, it is advantageous for query or ingest to evenly spreads
-groups of tablets within a table. For <a href="https://issues.apache.org/jira/browse/ACCUMULO-3439">ACCUMULO-3439</a>,
a new
-balancer was added to evenly spread groups of tablets to optimize performance.
-This <a href="https://blogs.apache.org/accumulo/entry/balancing_groups_of_tablets">blog
post</a> provides more details about when and why
-users may desire to leverage this feature..</p>
-<h2 id="user-specified-durability">User-specified Durability</h2>
-<p>Accumulo constantly tries to balance durability with performance. Guaranteeing
-durability of every write to Accumulo is very difficult in a
-massively-concurrent environment that requires high throughput. One common
-area of focus is the write-ahead log, since it must eventually call <code>fsync</code>
on
-the local filesystem to guarantee that data written is durable in the face of
-unexpected power failures. In some cases where durability can be sacrificed,
-either due to the nature of the data itself or redundant power supplies,
-ingest performance improvements can be attained.</p>
-<p>Prior to 1.7, a user could only configure the level of durability for
-individual tables. With the implementation of <a href="https://issues.apache.org/jira/browse/ACCUMULO-1957">ACCUMULO-1957</a>,
-the durability can be specified by the user when creating a <code>BatchWriter</code>,
-giving users control over durability at the level of the individual writes.
-Every <code>Mutation</code> written using that <code>BatchWriter</code>
will be written with the
-provided durability. This can result in substantially faster ingest rates when
-the durability can be relaxed.</p>
-<h2 id="waitforbalance-api">waitForBalance API</h2>
-<p>When creating a new Accumulo table, the next step is typically adding splits
-to that table before starting ingest. This can be extremely important since a
-table without any splits will only be hosted on a single tablet server and
-create a ingest bottleneck until the table begins to naturally split. Adding
-many splits before ingesting will ensure that a table is distributed across
-many servers and result in high throughput when ingest first starts.</p>
-<p>Adding splits to a table has long been a synchronous operation, but the
-assignment of those splits was asynchronous. A large number of splits could be
-processed, but it was not guaranteed that they would be evenly distributed
-resulting in the same problem as having an insufficient number of splits.
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-2998">ACCUMULO-2998</a>
adds a new method to <code>InstanceOperations</code> which
-allows users to wait for all tablets to be balanced. This method lets users
-wait until tablets are appropriately distributed so that ingest can be run at
-full-bore immediately.</p>
-<h2 id="hadoop-metrics2-support">Hadoop Metrics2 Support</h2>
-<p>Accumulo has long had its own metrics system implemented using Java MBeans.
-This enabled metrics to be reported by Accumulo services, but consumption by
-other systems often required use of an additional tool like jmxtrans to read
-the metrics from the MBeans and send them to some other system.</p>
-<p><a href="https://issues.apache.org/jira/browse/ACCUMULO-1817">ACCUMULO-1817</a>
replaces this custom metrics system Accumulo
-with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of
-which is invalidating the need for an additional process to send metrics to
-common metrics storage and visualization tools. With Metrics2 support,
-Accumulo can send its metrics to common tools like Ganglia and Graphite.</p>
-<p>For more information on enabling Hadoop Metrics2, see the <a href="/1.7/accumulo_user_manual.html#_metrics">Metrics
-Chapter</a> in the Accumulo User Manual.</p>
-<h2 id="distributed-tracing-with-htrace">Distributed Tracing with HTrace</h2>
-<p>HTrace has recently started gaining traction as a standalone project,
-especially with its adoption in HDFS. Accumulo has long had distributed
-tracing support via its own "Cloudtrace" library, but this wasn't intended for
-use outside of Accumulo.</p>
-<p><a href="https://issues.apache.org/jira/browse/ACCUMULO-898">ACCUMULO-898</a>
replaces Accumulo's Cloudtrace code with HTrace.
-This has the benefit of adding timings (spans) from HDFS into Accumulo spans
-automatically.</p>
-<p>Users who inspect traces via the Accumulo Monitor (or another system) will begin
-to see timings from HDFS during operations like Major and Minor compactions when
-running with at least Apache Hadoop 2.6.0.</p>
-<h2 id="versions-file-present-in-binary-distribution">VERSIONS file present in binary
distribution</h2>
-<p>In the pre-built binary distribution or distributions built by users from the
-official source release, users will now see a <code>VERSIONS</code> file present
in the
-<code>lib/</code> directory alongside the Accumulo server-side jars. Because
the created
-tarball strips off versions from the jar file names, it can require extra work
-to actually find what the version of each dependent jar (typically inspecting
-the jar's manifest).</p>
-<p><a href="https://issues.apache.org/jira/browse/ACCUMULO-2863">ACCUMULO-2863</a>
adds a <code>VERSIONS</code> file to the <code>lib/</code> directory
-which contains the Maven groupId, artifactId, and verison (GAV) information for
-each jar file included in the distribution.</p>
-<h2 id="per-table-volume-chooser">Per-Table Volume Chooser</h2>
-<p>The <code>VolumeChooser</code> interface is a server-side extension
point that allows user
-tables to provide custom logic in choosing where its files are written when
-multiple HDFS instances are available. By default, a randomized volume chooser
-implementation is used to evenly balance files across all HDFS instances.</p>
-<p>Previously, this VolumeChooser logic was instance-wide which meant that it would
-affect all tables. This is potentially undesirable as it might unintentionally
-impact other users in a multi-tenant system. <a href="https://issues.apache.org/jira/browse/ACCUMULO-3177">ACCUMULO-3177</a>
-introduces a new per-table property which supports configuration of a
-<code>VolumeChooser</code>. This ensures that the implementation to choose how
HDFS
-utilization happens when multiple are available is limited to the expected
-subset of all tables.</p>
-<h2 id="table-and-namespace-custom-properties">Table and namespace custom properties</h2>
-<p>In order to avoid errors caused by mis-typed configuration properties, Accumulo
was strict about which configuration properties 
-could be set. However, this prevented users from setting arbitrary properties that could
be used by custom balancers, compaction 
-strategies, volume choosers, and iterators. Under <a href="https://issues.apache.org/jira/browse/ACCUMULO-2841">ACCUMULO-2841</a>,
the ability to set arbitrary table and 
-namespace properties was added. The properties need to be prefixed with <code>table.custom.</code>.
 The changes made in 
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-3177">ACCUMULO-3177</a>
and <a href="https://issues.apache.org/jira/browse/ACCUMULO-3439">ACCUMULO-3439</a>
leverage this new feature.</p>
 <h1 id="notable-bug-fixes">Notable Bug Fixes</h1>
-<h2 id="sourceswitchingiterator-deadlock">SourceSwitchingIterator Deadlock</h2>
-<p>An instance of SourceSwitchingIterator, the Accumulo iterator which
-transparently manages whether data for a tablet read from memory (the
-in-memory map) or disk (HDFS after a minor compaction), was found deadlocked
-in a production system.</p>
-<p>This deadlock prevented the scan and the minor compaction from ever
-successfully completing without restarting the tablet server.
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-3745">ACCUMULO-3745</a>
fixes the inconsistent synchronization inside
-of the SourceSwitchingIterator to prevent this deadlock from happening in the
-future.</p>
-<p>The only mitigation of this bug was to restart the tablet server that is
-deadlocked.</p>
-<h2 id="table-flush-blocked-indefinitely">Table flush blocked indefinitely</h2>
-<p>While running the Accumulo RandomWalk distributed test, it was observed that
-all activity in Accumulo had stopped and there was an offline Accumulo
-metadata table tablet. The system first tried to flush a user tablet, but the
-metadata table was not online (likely due to the agitation process which stops
-and starts Accumulo processes during the test). After this call, a call to
-load the metadata tablet was queued but could not complete until the previous
-flush call. Thus, a deadlock occurred.</p>
-<p>This deadlock happened because the synchronous flush call could not complete
-before the load tablet call completed, but the load tablet call couldn't run
-because of connection caching we perform in Accumulo's RPC layer to reduce the
-quantity of sockets we need to create to send data.
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-3597">ACCUMULO-3597</a>
prevents this deadlock by forcing the use of a
-non-cached connection for the RPC message requesting a metadata tablet to be
-loaded.</p>
-<p>While this feature does result in additional network resources to be used, the
-concern is minimal because the number of metadata tablets is typically very
-small with respect to the total number of tablets in the system.</p>
-<p>The only mitigation of this bug was to restart the tablet server that is hung.</p>
 <h1 id="testing">Testing</h1>
 <p>Each unit and functional test only runs on a single node, while the RandomWalk
 and Continuous Ingest tests run on any number of nodes. <em>Agitation</em> refers
to
@@ -504,7 +234,7 @@ with HDFS using Apache Hadoop 2.6.0 when
 HDFS datanodes. The developers investigated these issues as a part of the
 normal release testing procedures, but were unable to find a definitive cause
 of these failures. Users are encouraged to follow
-<a href="https://issues.apache.org/jira/browse/ACCUMULO-2388">ACCUMULO-2388</a>
if they wish to follow any future developments.
+[ACCUMULO-2388][ACCUMULO-2388] if they wish to follow any future developments.
 One possible workaround is to increase the <code>general.rpc.timeout</code> in
the
 Accumulo configuration from <code>120s</code> to <code>240s</code>.</p>
 <table id="release_notes_testing">
@@ -517,36 +247,12 @@ Accumulo configuration from <code>120s</
     <th>Tests</th>
   </tr>
   <tr>
-    <td>Gentoo</tdt>
+    <td>N/A</tdt>
+    <td>N/A</td>
+    <td>N/A</td>
+    <td>N/A</td>
     <td>N/A</td>
-    <td>1</td>
     <td>N/A</td>
-    <td>No</td>
-    <td>Unit and Integration Tests</td>
-  </tr>
-  <tr>
-    <td>Gentoo</tdt>
-    <td>2.6.0</td>
-    <td>1 (2 TServers)</td>
-    <td>3.4.5</td>
-    <td>No</td>
-    <td>24hr CI w/ agitation and verification, 24hr RW w/o agitation.</td>
-  </tr>
-  <tr>
-    <td>Centos 6.6</td>
-    <td>2.6.0</td>
-    <td>3</td>
-    <td>3.4.6</td>
-    <td>No</td>
-    <td>24hr RW w/ agitation, 24hr CI w/o agitation, 72hr CI w/ and w/o agitation</td>
-  </tr>
-  <tr>
-    <td>Amazon Linux</td>
-    <td>2.6.0</td>
-    <td>20 m1large</td>
-    <td>3.4.6</td>
-    <td>No</td>
-    <td>24hr CI w/o agitation</td>
   </tr>
 </table>
   </div>



Mime
View raw message