accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1584908 - /accumulo/site/trunk/content/release_notes/1.6.0.mdtext
Date Fri, 04 Apr 2014 21:03:51 GMT
Author: kturner
Date: Fri Apr  4 21:03:50 2014
New Revision: 1584908

ACCUMULO-2396 checkin of WIP 1.6.0 release notes

    accumulo/site/trunk/content/release_notes/1.6.0.mdtext   (with props)

Added: accumulo/site/trunk/content/release_notes/1.6.0.mdtext
--- accumulo/site/trunk/content/release_notes/1.6.0.mdtext (added)
+++ accumulo/site/trunk/content/release_notes/1.6.0.mdtext Fri Apr  4 21:03:50 2014
@@ -0,0 +1,238 @@
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+Apache Accumulo 1.6.0
+This document is a work in progress.
+## Notable Improvements
+### Multiple namenode support
+BigTable's design allow's for its internal metadata to automatically spread across multiple
nodes.  Accumulo has followed this design and scales very well as a result.  There is one
impediment to scaling though, and this is the HDFS namenode.  There are two problems with
the namenode when it comes to scaling.  First, the namenode stores all of its filesystem metadata
in memory on a single machine.  This introduces an upper bound on the number of files Accumulo
can have.  Second, there is an upper bound on the number of file operations per second that
a single namenode can support.  For example a namenode can only support a few thousand delete
or create file request per second.  
+To overcome this bottleneck support for multiple namenodes was added under [ACCUMULO-118][ACCUMULO-118].
 This change allows Accumulo to store its files across multiple namenodes.  To use this feature
place comma separated list of namenode URIs in the new instance.volumes configuration property.
 Modify this setting after a successful upgrade.
+### Table namespaces
+Administering an Accumulo instance with lots of tables is cumbersome.  To ease this [ACCUMULO-802][ACCUMULO-802]
introduced table namespaces which allow tables to be grouped.  This allows configuration and
permission changes to made to a namespace, which will apply to all of its tables.  Example
use cases are ... TODO
+### Conditional Mutations
+Accumulo has not offered a way to make atomic row changes until now.  Accumulo now supports
atomic test and set row operations.  [ACCUMULO-1000][ACCUMULO-1000] added conditional mutations
and a conditional writer.  A conditional mutation has tests on columns that must pass before
any changes are made.  These test are executed in server processes while a row lock is held.
 Below is a simple example of making atomic row changes using conditional mutations.
+ 1. Read columns X,Y,SEQ into a,b,s from row R1 using an isolated scanner.
+ 2. For row R1 write conditional mutation X=f(a),Y=g(b),SEQ=s+1 if SEQ==s.
+ 3. If conditional mutation failed, then goto step 1.
+The only built in test that conditional mutations support are equality and isNull.  However,
iterators can be configured on a conditional mutation to run before these test.  This makes
it possible to implement any number of test such as less than, greater than, contains, etc.
+### Encryption
+Support for encrypting Accumulo's persistent and over the wire data was added.   [ACCUMULO-998][ACCUMULO-998],
[ACCUMULO-958][ACCUMULO-958], and [ACCUMULO-980][ACCUMULO-980] cover encrypting data at rest
in write ahead logs and rfiles.   [ACCUMULO-1009][ACCUMULO-1009] covers encrypting data over
the wire using SSL.  
+### Pluggable compaction strategies
+One of the key elements of the Big Table design is use of the Log Structured Merge Tree (LSMT)
concept.  This entails sorting data in memory, writing out sorted files, and then later merging
multiple sorted files into a single file.   These automatic merges happen in the background
and Accumulo decides when to merge files based comparing relative sizes of files to a compaction
ratio.  Adjusting the compaction ratio is the only way a user can control this process.  [ACCUMULO-1451][ACCUMULO-1451]
introduces pluggable compaction strategies which allow users to choose when and what files
to compact.  [ACCUMULO-1808][ACCUMULO-1808] adds a compaction strategy the prevents compaction
of files over a configurable size.
+### Lexicoders
+Accumulo only sorts data lexicographically.  Getting something like a pair of (<string>,<integer>)
to sort correctly in Accumulo is tricky.  Its tricky because you only want to compare the
integers if the strings are equal.  Its possible to make this sort properly in Accumulo if
the data is encoded properly, but that's the tricky part.  To make this easier [ACCUMULO-1336][ACCUMULO-1336]
added Lexicoders to the Accumulo API.  Lexicoders provide an easy way to serialize data so
that it sorts properly lexicographically.  Below is a simple example.
+ > PairLexicoder plex = new PairLexicoder(new StringLexicoder(), new IntegerLexicoder());
+ > byte[] ba1 = plex.encode(new ComparablePair<String, Integer>("b",1));
+ > byte[] ba2 = plex.encode(new ComparablePair<String, Integer>("aa",1));
+ > byte[] ba3 = plex.encode(new ComparablePair<String, Integer>("a",2));
+ > byte[] ba4 = plex.encode(new ComparablePair<String, Integer>("a",1)); 
+ > byte[] ba5 = plex.encode(new ComparablePair<String, Integer>("aa",-3));
+ >
+ > //sorting ba1,ba2,ba3,ba4, and ba5 lexicographically will result in the same order
as sorting the ComparablePairs
+### Multi-table Accumulo input format
+[ACCUMULO-391][ACCUMULO-391] makes it possible to easily read from multiple tables in a Map
Reduce job.  TODO is there more to say about this, if not maybe move to one-liners.
+### Locality groups in memory
+In cases where a very small amount of data is stored in a locality group one would expect
fast scans over that locality group.  However this was not always the case because recently
written data stored in memory was not partitioned by locality group.  Therefore if a table
had 100GB of data in memory and 1MB of that was in locality group A, then scanning A would
have required reading all 100GB.  [ACCUMULO-112][ACCUMULO-112] changes this and partitions
data by locality group as its written.
+### Jline2 support in shell
+[ACCUMULO-1442][ACCUMULO-1442] TODO whats some of the goodness this brings to the shell?
+### Service IP addresses
+Previous versions of Accumulo always used IP addresses internally.  This could be problematic
in virtual machine environments where IP addresses change.  In [ACCUMULO-1585][ACCUMULO-1585]
this was changed, now the accumulo uses the exact hostnames from its config files for internal
+All Accumulo processes running on a cluster are locatable via zookeeper.  Therefore using
well known ports is not really required.  [ACCUMULO-1664][ACCUMULO-1664] makes it possible
to for all Accumulo processes to use random ports.  This makes it easier to run multiple Accumulo
processes on a single node.   
+### Other notable changes
+ * [ACCUMULO-842][ACCUMULO-842] Added FATE administration to shell
+ * [ACCUMULO-1481][ACCUMULO-1481] The root tablet is now the root table.
+ * [ACCUMULO-1566][ACCUMULO-1566] When read-ahead starts in the scanner is now configurable.
+ * [ACCUMULO-1667][ACCUMULO-1667] Added a synchronous version of online and offline table
+ * [ACCUMULO-2128][ACCUMULO-2128] Provide resource cleanup via static utility
+## Notable Bug Fixes
+TODO kturner looked at bugs w/ fix version of 1.6.0 and a non-empty affects version and selected
ones he thought were relevant to users.... need others devs to do this
+TODO some bugs may be unintelligible to end users... either improve the issue descritpion
or remove from list
+ * [ACCUMULO-324][ACCUMULO-324] System/site constraints and iterators should NOT affect the
+ * [ACCUMULO-335][ACCUMULO-335] Batch scanning over the !METADATA table can cause issues
+ * [ACCUMULO-1018][ACCUMULO-1018] Client does not give informative message when user can
not read table
+ * [ACCUMULO-1492][ACCUMULO-1492] bin/accumulo should follow symbolic links
+ * [ACCUMULO-1572][ACCUMULO-1572] Single node zookeeper failure kills connected accumulo
+ * [ACCUMULO-1661][ACCUMULO-1661] AccumuloInputFormat cannot fetch empty column family
+ * [ACCUMULO-1696][ACCUMULO-1696] Deep copy in the compaction scope iterators can throw off
the stats
+ * [ACCUMULO-1698][ACCUMULO-1698] stop-here doesn't consider system hostname
+ * [ACCUMULO-1833][ACCUMULO-1833] MultiTableBatchWriterImpl.getBatchWriter() is not performant
for multiple threads
+ * [ACCUMULO-1901][ACCUMULO-1901] starts only one GC process even if more are
+ * [ACCUMULO-1921][ACCUMULO-1921] NPE in tablet assignment
+ * [ACCUMULO-1994][ACCUMULO-1994] Proxy does not handle Key timestamps correctly
+ * [ACCUMULO-2174][ACCUMULO-2174] VFS Classloader has potential to collide localized resources
+ * [ACCUMULO-2225][ACCUMULO-2225] Need to better handle DNS failure propagation from Hadoop
+ * [ACCUMULO-2234][ACCUMULO-2234] Cannot run offline mapreduce over non-default instance.dfs.dir
+ * [ACCUMULO-2334][ACCUMULO-2334] Lacking fallback when ACCUMULO_LOG_HOST isn't set
+ * [ACCUMULO-2408][ACCUMULO-2408] metadata table not assigned after root table is loaded
+ * [ACCUMULO-2519][ACCUMULO-2519] FATE operation failed across upgrade
+## Known Issues
+When using Accumulo 1.6 and Hadoop 2, Accumulo will call hsync() on HDFS.
+Calling hsync improves durability by ensuring data is on disk (where other older 
+Hadoop versions might lose data in the face of power failure); however, calling
+hsync frequently does noticeably slow writes. A simple work around is to increase 
+the value of the tserver.mutation.queue.max configuration parameter via accumulo-site.xml.
+A value of "4M" is a better recommendation, and memory consumption will increase by
+the number of concurrent writers to that TabletServer. For example, a value of 4M with
+50 concurrent writers would equate to approximately 200M of Java heap being used for
+mutation queues.
+For more information, see [ACCUMULO-1950][ACCUMULO-1950] and [this comment][ACCUMULO-1905-comment].
+### Other known issues
+ * [ACCUMULO-1507][ACCUMULO-1507] Dynamic Classloader still can't keep proper track of jars
+ * [ACCUMULO-1588][ACCUMULO-1588] Monitor XML and JSON differ
+ * [ACCUMULO-1628][ACCUMULO-1628] NPE on deep copied dumped memory iterator
+ * [ACCUMULO-1708][ACCUMULO-1708] [ACCUMULO-2495][ACCUMULO-2495] Out of memory errors do
not always kill tservers leading to unexpected behavior
+ * [ACCUMULO-2008][ACCUMULO-2008] Block cache reserves section for in-memory blocks
+ * [ACCUMULO-2059][ACCUMULO-2059] Namespace constraints easily get clobbered by table constraints
+TODO look for other known issues
+## Documentation updates
+ * [ACCUMULO-1218][ACCUMULO-1218] document the recovery from a failed zookeeper
+ * [ACCUMULO-1375][ACCUMULO-1375] Update README files in proxy module.
+ * [ACCUMULO-1407][ACCUMULO-1407] Fix documentation for deleterows
+ * [ACCUMULO-1428][ACCUMULO-1428] Document native maps
+ * [ACCUMULO-1946][ACCUMULO-1946] Include dfs.datanode.synconclose in hdfs configuration
+ * [ACCUMULO-1956][ACCUMULO-1956] Add section on decomissioning or adding nodes to an Accumulo
+ * [ACCUMULO-2441][ACCUMULO-2441] Document internal state stored in RFile names
+ * [ACCUMULO-2590][ACCUMULO-2590] Update public API in readme to clarify what's included
+## Testing
+Below is a list of all platforms that 1.6.0 was tested against by developers. Each Apache
Accumulo release
+has a set of tests that must be run before the candidate is capable of becoming an official
release. That list includes the following:
+ 1. Successfully run all unit tests
+ 2. Successfully run all functional test (test/system/auto)
+ 3. Successfully complete two 24-hour RandomWalk tests (LongClean module), with and without
+ 4. Successfully complete two 24-hour Continuous Ingest tests, with and without "agitation",
with data verification
+ 5. Successfully complete two 72-hour Continuous Ingest tests, with and without "agitation"
+Each unit and functional test only runs on a single node, while the RandomWalk and Continuous
Ingest tests run 
+on any number of nodes. *Agitation* refers to randomly restarting Accumulo processes and
Hadoop Datanode processes,
+and, in HDFS High-Availability instances, forcing NameNode failover.
+<table id="release_notes_testing">
+  <tr>
+    <th>OS</th>
+    <th>Hadoop</th>
+    <th>Nodes</th>
+    <th>ZooKeeper</th>
+    <th>HDFS High-Availability</th>
+    <th>Tests</th>
+  </tr>
+[ACCUMULO-112]: "Partition data in memory
by locality group"
+[ACCUMULO-118]: "Multiple namenode support"
+[ACCUMULO-324]: "System/site constraints
and iterators should NOT affect the METADATA table"
+[ACCUMULO-335]: "Batch scanning over the
!METADATA table can cause issues"
+[ACCUMULO-391]: "Multi-table input format"
+[ACCUMULO-802]: "Table namespaces"
+[ACCUMULO-842]: "Add FATE administration
to shell"
+[ACCUMULO-958]: "Support pluggable encryption
in walogs"
+[ACCUMULO-998]: "Support encryption at
+[ACCUMULO-980]: "Support pluggable codecs
for RFile"
+[ACCUMULO-1000]: "Conditional Mutations"
+[ACCUMULO-1009]: "Support encryption
over the wire"
+[ACCUMULO-1018]: "Client does not give
informative message when user can not read table"
+[ACCUMULO-1218]: "document the recovery
from a failed zookeeper"
+[ACCUMULO-1336]: "Add lexicoders from
Typo to Accumulo"
+[ACCUMULO-1375]: "Update README files
in proxy module."
+[ACCUMULO-1407]: "Fix documentation for
+[ACCUMULO-1428]: "Document native maps"
+[ACCUMULO-1442]: "Replace JLine with
+[ACCUMULO-1451]: "Make Compaction triggers
+[ACCUMULO-1481]: "Root tablet in its
own table"
+[ACCUMULO-1492]: "bin/accumulo should
follow symbolic links"
+[ACCUMULO-1507]: "Dynamic Classloader
still can't keep proper track of jars"
+[ACCUMULO-1585]: "Use node addresses
from config files verbatim"
+[ACCUMULO-1562]: "add a troubleshooting
section to the user guide"
+[ACCUMULO-1566]: "Add ability for client
to start Scanner readahead immediately"
+[ACCUMULO-1572]: "Single node zookeeper
failure kills connected accumulo servers"
+[ACCUMULO-1585]: "Use FQDN/verbatim data
from config files"
+[ACCUMULO-1588]: "Monitor XML and JSON
+[ACCUMULO-1628]: "NPE on deep copied
dumped memory iterator"
+[ACCUMULO-1661]: "AccumuloInputFormat
cannot fetch empty column family"
+[ACCUMULO-1664]: "Make all processes
able to use random ports"
+[ACCUMULO-1667]: "Allow On/Offline Command
To Execute Synchronously"
+[ACCUMULO-1696]: "Deep copy in the compaction
scope iterators can throw off the stats"
+[ACCUMULO-1698]: "stop-here doesn't consider
system hostname"
+[ACCUMULO-1704]: "IteratorSetting missing
(int,String,Class,Map) constructor"
+[ACCUMULO-1708]: "Error during minor
compaction left tserver in bad state"
+[ACCUMULO-1808]: "Create compaction strategy
that has size limit"
+[ACCUMULO-1833]: "MultiTableBatchWriterImpl.getBatchWriter()
is not performant for multiple threads"
+[ACCUMULO-1901]: " starts
only one GC process even if more are defined"
+[ACCUMULO-1921]: "NPE in tablet assignment"
+[ACCUMULO-1946]: "Include dfs.datanode.synconclose
in hdfs configuration documentation"
+[ACCUMULO-1950]: "Reduce the number of
calls to hsync"
+[ACCUMULO-1956]: "Add section on decomissioning
or adding nodes to an Accumulo cluster"
+[ACCUMULO-1958]: "Range constructor lacks
key checks, should be non-public"
+[ACCUMULO-1994]: "Proxy does not handle
Key timestamps correctly"
+[ACCUMULO-2008]: "Block cache reserves
section for in-memory blocks"
+[ACCUMULO-2059]: "Namespace constraints
easily get clobbered by table constraints"
+[ACCUMULO-2128]: "Provide resource cleanup
via static utility rather than Instance.close"
+[ACCUMULO-2174]: "VFS Classloader has
potential to collide localized resources"
+[ACCUMULO-2225]: "Need to better handle
DNS failure propagation from Hadoop"
+[ACCUMULO-2234]: "Cannot run offline
mapreduce over non-default instance.dfs.dir value"
+[ACCUMULO-2334]: "Lacking fallback when
+[ACCUMULO-2408]: "metadata table not
assigned after root table is loaded"
+[ACCUMULO-2441]: "Document internal state
stored in RFile names"
+[ACCUMULO-2495]: "OOM exception didn't
bring down tserver"
+[ACCUMULO-2519]: "FATE operation failed
across upgrade"
+[ACCUMULO-2590]: "Update public API in
readme to clarify what's included"

Propchange: accumulo/site/trunk/content/release_notes/1.6.0.mdtext
    svn:eol-style = native

View raw message