accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r904955 - in /websites/staging/accumulo/trunk/content: ./ release_notes/1.6.0.html
Date Fri, 04 Apr 2014 21:03:57 GMT
Author: buildbot
Date: Fri Apr  4 21:03:57 2014
New Revision: 904955

Staging update by buildbot for accumulo

    websites/staging/accumulo/trunk/content/   (props changed)

Propchange: websites/staging/accumulo/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Fri Apr  4 21:03:57 2014
@@ -1 +1 @@

Added: websites/staging/accumulo/trunk/content/release_notes/1.6.0.html
--- websites/staging/accumulo/trunk/content/release_notes/1.6.0.html (added)
+++ websites/staging/accumulo/trunk/content/release_notes/1.6.0.html Fri Apr  4 21:03:57 2014
@@ -0,0 +1,240 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+ 2.0
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+  <link href="/css/accumulo.css" rel="stylesheet" type="text/css">
+  <title></title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <script type="text/javascript">
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-21103458-6']);
+    _gaq.push(['_setDomainName', '']);
+    _gaq.push(['_setAllowLinker', true]);
+    _gaq.push(['_trackPageview']);
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async =
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') +
+      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+    })();
+  </script>
+  <div id="banner">&nbsp;
+  </div>
+  <div id="navigation">
+  <h1 id="project">Project</h1>
+<li><a href="/">Home</a></li>
+<li><a href="/downloads">Downloads</a></li>
+<li><a href="/notable_features.html">Features</a></li>
+<li><a href="">License</a></li>
+<h1 id="community">Community</h1>
+<li><a href="/get_involved.html">Get Involved</a></li>
+<li><a href="/mailing_list.html">Mailing Lists</a></li>
+<li><a href="/people.html">People</a></li>
+<h1 id="development">Development</h1>
+<li><a href="/source.html">Source &amp; Guide</a></li>
+<li><a href="/git.html">Git WIP</a></li>
+<li><a href="/contrib.html">Contrib Projects</a></li>
+<li><a href="/releasing.html">Making Releases</a></li>
+<li><a href="">Issues</a></li>
+<li><a href="">Builds</a></li>
+<h1 id="documentation">Documentation</h1>
+<li>Manual <a href="/1.4/user_manual">1.4</a> / <a href="/1.5/accumulo_user_manual.html">1.5</a></li>
+<li>Javadoc <a href="/1.4/apidocs">1.4</a> / <a href="/1.5/apidocs">1.5</a></li>
+<li>Examples <a href="/1.4/examples">1.4</a> / <a href="/1.5/examples">1.5</a></li>
+<li><a href="/screenshots.html">Screenshots</a></li>
+<li><a href="/papers.html">Papers &amp; Other Links</a></li>
+<li><a href="/glossary.html">Glossary</a></li>
+<h1 id="asf-links">ASF links</h1>
+<li><a href="">Apache Software Foundation</a></li>
+<li><a href="">Sponsorship</a></li>
+<li><a href="">Security</a></li>
+<li><a href="">Thanks</a></li>
+  </div>
+  <div id="bannertext">
+    <img id="logo" alt="Apache Accumulo" src="/images/accumulo-logo.png"/>&trade;
+  </div>
+  <div id="content">
+    <h1 class="title"></h1>
+    <p><strong>DRAFT 1.6.0 RELEASE NOTES</strong></p>
+<p>Apache Accumulo 1.6.0</p>
+<p>This document is a work in progress.</p>
+<h2 id="notable-improvements">Notable Improvements</h2>
+<h3 id="multiple-namenode-support">Multiple namenode support</h3>
+<p>BigTable's design allow's for its internal metadata to automatically spread across
multiple nodes.  Accumulo has followed this design and scales very well as a result.  There
is one impediment to scaling though, and this is the HDFS namenode.  There are two problems
with the namenode when it comes to scaling.  First, the namenode stores all of its filesystem
metadata in memory on a single machine.  This introduces an upper bound on the number of files
Accumulo can have.  Second, there is an upper bound on the number of file operations per second
that a single namenode can support.  For example a namenode can only support a few thousand
delete or create file request per second.  </p>
+<p>To overcome this bottleneck support for multiple namenodes was added under <a
href="" title="Multiple namenode support">ACCUMULO-118</a>.
 This change allows Accumulo to store its files across multiple namenodes.  To use this feature
place comma separated list of namenode URIs in the new instance.volumes configuration property.
 Modify this setting after a successful upgrade.</p>
+<h3 id="table-namespaces">Table namespaces</h3>
+<p>Administering an Accumulo instance with lots of tables is cumbersome.  To ease this
<a href="" title="Table namespaces">ACCUMULO-802</a>
introduced table namespaces which allow tables to be grouped.  This allows configuration and
permission changes to made to a namespace, which will apply to all of its tables.  Example
use cases are ... TODO</p>
+<h3 id="conditional-mutations">Conditional Mutations</h3>
+<p>Accumulo has not offered a way to make atomic row changes until now.  Accumulo now
supports atomic test and set row operations.  <a href=""
title="Conditional Mutations">ACCUMULO-1000</a> added conditional mutations and a
conditional writer.  A conditional mutation has tests on columns that must pass before any
changes are made.  These test are executed in server processes while a row lock is held. 
Below is a simple example of making atomic row changes using conditional mutations.</p>
+<li>Read columns X,Y,SEQ into a,b,s from row R1 using an isolated scanner.</li>
+<li>For row R1 write conditional mutation X=f(a),Y=g(b),SEQ=s+1 if SEQ==s.</li>
+<li>If conditional mutation failed, then goto step 1.</li>
+<p>The only built in test that conditional mutations support are equality and isNull.
 However, iterators can be configured on a conditional mutation to run before these test.
 This makes it possible to implement any number of test such as less than, greater than, contains,
+<h3 id="encryption">Encryption</h3>
+<p>Support for encrypting Accumulo's persistent and over the wire data was added. 
 <a href="" title="Support encryption
at rest">ACCUMULO-998</a>, <a href=""
title="Support pluggable encryption in walogs">ACCUMULO-958</a>, and <a href=""
title="Support pluggable codecs for RFile">ACCUMULO-980</a> cover encrypting data
at rest in write ahead logs and rfiles.   <a href=""
title="Support encryption over the wire">ACCUMULO-1009</a> covers encrypting data
over the wire using SSL.  </p>
+<h3 id="pluggable-compaction-strategies">Pluggable compaction strategies</h3>
+<p>One of the key elements of the Big Table design is use of the Log Structured Merge
Tree (LSMT) concept.  This entails sorting data in memory, writing out sorted files, and then
later merging multiple sorted files into a single file.   These automatic merges happen in
the background and Accumulo decides when to merge files based comparing relative sizes of
files to a compaction ratio.  Adjusting the compaction ratio is the only way a user can control
this process.  <a href="" title="Make
Compaction triggers extensible">ACCUMULO-1451</a> introduces pluggable compaction
strategies which allow users to choose when and what files to compact.  <a href=""
title="Create compaction strategy that has size limit">ACCUMULO-1808</a> adds a compaction
strategy the prevents compaction of files over a configurable size.</p>
+<h3 id="lexicoders">Lexicoders</h3>
+<p>Accumulo only sorts data lexicographically.  Getting something like a pair of (<string>,<integer>)
to sort correctly in Accumulo is tricky.  Its tricky because you only want to compare the
integers if the strings are equal.  Its possible to make this sort properly in Accumulo if
the data is encoded properly, but that's the tricky part.  To make this easier <a href=""
title="Add lexicoders from Typo to Accumulo">ACCUMULO-1336</a> added Lexicoders to
the Accumulo API.  Lexicoders provide an easy way to serialize data so that it sorts properly
lexicographically.  Below is a simple example.</p>
+<p>PairLexicoder plex = new PairLexicoder(new StringLexicoder(), new IntegerLexicoder());
+byte[] ba1 = plex.encode(new ComparablePair<String, Integer>("b",1));
+byte[] ba2 = plex.encode(new ComparablePair<String, Integer>("aa",1));
+byte[] ba3 = plex.encode(new ComparablePair<String, Integer>("a",2));
+byte[] ba4 = plex.encode(new ComparablePair<String, Integer>("a",1)); 
+byte[] ba5 = plex.encode(new ComparablePair<String, Integer>("aa",-3));</p>
+<p>//sorting ba1,ba2,ba3,ba4, and ba5 lexicographically will result in the same order
as sorting the ComparablePairs</p>
+<h3 id="multi-table-accumulo-input-format">Multi-table Accumulo input format</h3>
+<p><a href="" title="Multi-table
input format">ACCUMULO-391</a> makes it possible to easily read from multiple tables
in a Map Reduce job.  TODO is there more to say about this, if not maybe move to one-liners.</p>
+<h3 id="locality-groups-in-memory">Locality groups in memory</h3>
+<p>In cases where a very small amount of data is stored in a locality group one would
expect fast scans over that locality group.  However this was not always the case because
recently written data stored in memory was not partitioned by locality group.  Therefore if
a table had 100GB of data in memory and 1MB of that was in locality group A, then scanning
A would have required reading all 100GB.  <a href=""
title="Partition data in memory by locality group">ACCUMULO-112</a> changes this
and partitions data by locality group as its written.</p>
+<h3 id="jline2-support-in-shell">Jline2 support in shell</h3>
+<p><a href="" title="Replace
JLine with JLine2">ACCUMULO-1442</a> TODO whats some of the goodness this brings
to the shell?</p>
+<h3 id="service-ip-addresses">Service IP addresses</h3>
+<p>Previous versions of Accumulo always used IP addresses internally.  This could be
problematic in virtual machine environments where IP addresses change.  In <a href=""
title="Use FQDN/verbatim data from config files">ACCUMULO-1585</a> this was changed,
now the accumulo uses the exact hostnames from its config files for internal addressing. 
+<p>All Accumulo processes running on a cluster are locatable via zookeeper.  Therefore
using well known ports is not really required.  <a href=""
title="Make all processes able to use random ports">ACCUMULO-1664</a> makes it possible
to for all Accumulo processes to use random ports.  This makes it easier to run multiple Accumulo
processes on a single node.   </p>
+<h3 id="other-notable-changes">Other notable changes</h3>
+<li><a href="" title="Add FATE
administration to shell">ACCUMULO-842</a> Added FATE administration to shell</li>
+<li><a href="" title="Root tablet
in its own table">ACCUMULO-1481</a> The root tablet is now the root table.</li>
+<li><a href="" title="Add ability
for client to start Scanner readahead immediately">ACCUMULO-1566</a> When read-ahead
starts in the scanner is now configurable.</li>
+<li><a href="" title="Allow On/Offline
Command To Execute Synchronously">ACCUMULO-1667</a> Added a synchronous version of
online and offline table</li>
+<li><a href="" title="Provide
resource cleanup via static utility rather than Instance.close">ACCUMULO-2128</a>
Provide resource cleanup via static utility</li>
+<h2 id="notable-bug-fixes">Notable Bug Fixes</h2>
+<p>TODO kturner looked at bugs w/ fix version of 1.6.0 and a non-empty affects version
and selected ones he thought were relevant to users.... need others devs to do this
+TODO some bugs may be unintelligible to end users... either improve the issue descritpion
or remove from list</p>
+<li><a href="" title="System/site
constraints and iterators should NOT affect the METADATA table">ACCUMULO-324</a>
System/site constraints and iterators should NOT affect the METADATA table</li>
+<li><a href="" title="Batch scanning
over the !METADATA table can cause issues">ACCUMULO-335</a> Batch scanning over the
!METADATA table can cause issues</li>
+<li><a href="" title="Client
does not give informative message when user can not read table">ACCUMULO-1018</a>
Client does not give informative message when user can not read table</li>
+<li><a href="" title="bin/accumulo
should follow symbolic links">ACCUMULO-1492</a> bin/accumulo should follow symbolic
+<li><a href="" title="Single
node zookeeper failure kills connected accumulo servers">ACCUMULO-1572</a> Single
node zookeeper failure kills connected accumulo servers</li>
+<li><a href="" title="AccumuloInputFormat
cannot fetch empty column family">ACCUMULO-1661</a> AccumuloInputFormat cannot fetch
empty column family</li>
+<li><a href="" title="Deep copy
in the compaction scope iterators can throw off the stats">ACCUMULO-1696</a> Deep
copy in the compaction scope iterators can throw off the stats</li>
+<li><a href="" title="stop-here
doesn't consider system hostname">ACCUMULO-1698</a> stop-here doesn't consider system
+<li><a href="" title="MultiTableBatchWriterImpl.getBatchWriter()
is not performant for multiple threads">ACCUMULO-1833</a> MultiTableBatchWriterImpl.getBatchWriter()
is not performant for multiple threads</li>
+<li><a href="" title="
starts only one GC process even if more are defined">ACCUMULO-1901</a>
starts only one GC process even if more are defined</li>
+<li><a href="" title="NPE in
tablet assignment">ACCUMULO-1921</a> NPE in tablet assignment</li>
+<li><a href="" title="Proxy does
not handle Key timestamps correctly">ACCUMULO-1994</a> Proxy does not handle Key
timestamps correctly</li>
+<li><a href="" title="VFS Classloader
has potential to collide localized resources">ACCUMULO-2174</a> VFS Classloader has
potential to collide localized resources</li>
+<li><a href="" title="Need to
better handle DNS failure propagation from Hadoop">ACCUMULO-2225</a> Need to better
handle DNS failure propagation from Hadoop</li>
+<li><a href="" title="Cannot
run offline mapreduce over non-default instance.dfs.dir value">ACCUMULO-2234</a>
Cannot run offline mapreduce over non-default instance.dfs.dir value</li>
+<li><a href="" title="Lacking
fallback when ACCUMULO_LOG_HOST isn't set">ACCUMULO-2334</a> Lacking fallback when
ACCUMULO_LOG_HOST isn't set</li>
+<li><a href="" title="metadata
table not assigned after root table is loaded">ACCUMULO-2408</a> metadata table not
assigned after root table is loaded</li>
+<li><a href="" title="FATE operation
failed across upgrade">ACCUMULO-2519</a> FATE operation failed across upgrade</li>
+<h2 id="known-issues">Known Issues</h2>
+<p>When using Accumulo 1.6 and Hadoop 2, Accumulo will call hsync() on HDFS.
+Calling hsync improves durability by ensuring data is on disk (where other older 
+Hadoop versions might lose data in the face of power failure); however, calling
+hsync frequently does noticeably slow writes. A simple work around is to increase 
+the value of the tserver.mutation.queue.max configuration parameter via accumulo-site.xml.</p>
+<p>A value of "4M" is a better recommendation, and memory consumption will increase
+the number of concurrent writers to that TabletServer. For example, a value of 4M with
+50 concurrent writers would equate to approximately 200M of Java heap being used for
+mutation queues.</p>
+<p>For more information, see <a href=""
title="Reduce the number of calls to hsync">ACCUMULO-1950</a> and <a href=";page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13915208">this
+<h3 id="other-known-issues">Other known issues</h3>
+<li><a href="" title="Dynamic
Classloader still can't keep proper track of jars">ACCUMULO-1507</a> Dynamic Classloader
still can't keep proper track of jars</li>
+<li><a href="" title="Monitor
XML and JSON differ">ACCUMULO-1588</a> Monitor XML and JSON differ</li>
+<li><a href="" title="NPE on
deep copied dumped memory iterator">ACCUMULO-1628</a> NPE on deep copied dumped memory
+<li><a href="" title="Error during
minor compaction left tserver in bad state">ACCUMULO-1708</a> <a href=""
title="OOM exception didn't bring down tserver">ACCUMULO-2495</a> Out of memory errors
do not always kill tservers leading to unexpected behavior</li>
+<li><a href="" title="Block cache
reserves section for in-memory blocks">ACCUMULO-2008</a> Block cache reserves section
for in-memory blocks</li>
+<li><a href="" title="Namespace
constraints easily get clobbered by table constraints">ACCUMULO-2059</a> Namespace
constraints easily get clobbered by table constraints</li>
+<p>TODO look for other known issues</p>
+<h2 id="documentation-updates">Documentation updates</h2>
+<li><a href="" title="document
the recovery from a failed zookeeper">ACCUMULO-1218</a> document the recovery from
a failed zookeeper</li>
+<li><a href="" title="Update
README files in proxy module.">ACCUMULO-1375</a> Update README files in proxy module.</li>
+<li><a href="" title="Fix documentation
for deleterows">ACCUMULO-1407</a> Fix documentation for deleterows</li>
+<li><a href="" title="Document
native maps">ACCUMULO-1428</a> Document native maps</li>
+<li><a href="" title="Include
dfs.datanode.synconclose in hdfs configuration documentation">ACCUMULO-1946</a> Include
dfs.datanode.synconclose in hdfs configuration documentation</li>
+<li><a href="" title="Add section
on decomissioning or adding nodes to an Accumulo cluster">ACCUMULO-1956</a> Add section
on decomissioning or adding nodes to an Accumulo cluster</li>
+<li><a href="" title="Document
internal state stored in RFile names">ACCUMULO-2441</a> Document internal state stored
in RFile names</li>
+<li><a href="" title="Update
public API in readme to clarify what's included">ACCUMULO-2590</a> Update public
API in readme to clarify what's included</li>
+<h2 id="testing">Testing</h2>
+<p>Below is a list of all platforms that 1.6.0 was tested against by developers. Each
Apache Accumulo release
+has a set of tests that must be run before the candidate is capable of becoming an official
release. That list includes the following:</p>
+<li>Successfully run all unit tests</li>
+<li>Successfully run all functional test (test/system/auto)</li>
+<li>Successfully complete two 24-hour RandomWalk tests (LongClean module), with and
without "agitation"</li>
+<li>Successfully complete two 24-hour Continuous Ingest tests, with and without "agitation",
with data verification</li>
+<li>Successfully complete two 72-hour Continuous Ingest tests, with and without "agitation"</li>
+<p>Each unit and functional test only runs on a single node, while the RandomWalk and
Continuous Ingest tests run 
+on any number of nodes. <em>Agitation</em> refers to randomly restarting Accumulo
processes and Hadoop Datanode processes,
+and, in HDFS High-Availability instances, forcing NameNode failover.
+<table id="release_notes_testing">
+  <tr>
+    <th>OS</th>
+    <th>Hadoop</th>
+    <th>Nodes</th>
+    <th>ZooKeeper</th>
+    <th>HDFS High-Availability</th>
+    <th>Tests</th>
+  </tr>
+  </div>
+  <div id="footer">
+    <a alt="Apache Software Foundation" href="">
+      <img id="asf-logo" alt="Apache Software Foundation" src="/images/feather-small.gif"/
+    </a>
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2011-2013 The Apache Software Foundation, Licensed under
+        the <a href="">Apache License, Version
+        <br />
+        Apache Accumulo, Accumulo, Apache, the Apache feather logo, and the Apache Accumulo
+        <br />
+        project logo are trademarks of the <a href="">Apache Software
+      </p>
+    </div> 
+  </div>

View raw message