Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 34FA8C531 for ; Thu, 20 Jun 2013 13:32:04 +0000 (UTC) Received: (qmail 82052 invoked by uid 500); 20 Jun 2013 13:32:04 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 82042 invoked by uid 500); 20 Jun 2013 13:32:04 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 82035 invoked by uid 500); 20 Jun 2013 13:32:03 -0000 Delivered-To: apmail-incubator-accumulo-commits@incubator.apache.org Received: (qmail 82032 invoked by uid 99); 20 Jun 2013 13:32:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jun 2013 13:32:03 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jun 2013 13:32:00 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 6160F23889EC for ; Thu, 20 Jun 2013 13:31:40 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r866625 - in /websites/staging/accumulo/trunk/content: ./ notable_features.html Date: Thu, 20 Jun 2013 13:31:40 -0000 To: accumulo-commits@incubator.apache.org From: buildbot@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20130620133140.6160F23889EC@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: buildbot Date: Thu Jun 20 13:31:40 2013 New Revision: 866625 Log: Staging update by buildbot for accumulo Modified: websites/staging/accumulo/trunk/content/ (props changed) websites/staging/accumulo/trunk/content/notable_features.html Propchange: websites/staging/accumulo/trunk/content/ ------------------------------------------------------------------------------ --- cms:source-revision (original) +++ cms:source-revision Thu Jun 20 13:31:40 2013 @@ -1 +1 @@ -1494813 +1494983 Modified: websites/staging/accumulo/trunk/content/notable_features.html ============================================================================== --- websites/staging/accumulo/trunk/content/notable_features.html (original) +++ websites/staging/accumulo/trunk/content/notable_features.html Thu Jun 20 13:31:40 2013 @@ -178,7 +178,8 @@ Zookeeper to synchronize operations acro

If consecutive keys have identical portions (row, colf, colq, or colvis), there is a flag to indicate that a portion is the same as that of the previous key. This is applied when keys are stored on disk and when transferred over the -network.

+network. Starting with 1.5, prefix erasure is supported. When its cost +effective, prefixes repeated in subsequent key fields are not repeated.

Native In-Memory Map

By default data written is stored outside of Java managed memory into a C++ STL map of maps. It maps rows to columns to values. This hierarchical structure @@ -203,10 +204,23 @@ blocks. The entire index never has to be written. When an index block exceeds the configurable size threshold, its written out between data blocks. The size of index blocks is configurable on a per table basis.

+

Binary search in RFile blocks (1.5)

+

RFile uses its index to locate a block of key values. Once it reaches a block +it performs a linear scan to find a key on interest. Starting with 1.5, Accumulo +will generate indexes of cached blocks in an adaptive manner. Accumulo indexes +the blocks that are read most frequently. When a block is read a few times, a +small index is generated. As a block is read more, larger indexes are generated +making future seeks faster. This strategy allows Accumulo to dynamically respond +to read patterns without precomputing block indexes when RFiles are written.

Testing

Mock

The Accumulo client API has a mock implementation that is useful writing unit test against Accumulo. Mock Accumulo is in memory and in process.

+

Mini Accumulo Cluster (1.5 & 1.4.4)

+

Mini Accumulo cluster is a set of utility code that makes it easy to spin up +a local Accumulo instance running against the local filesystem. Mini Accumulo +is slower than Mock Accumulo, but its behavior is mirrors a real Accumulo +instance more closely.

Functional Test

Small, system-level tests of basic Accumulo features run in a test harness, external to the build and unit-tests. These tests start a complete Accumulo @@ -251,6 +265,12 @@ flexibility in resource allocation. The could be different from the Accumulo nodes.

Map Reduce

Accumulo can be a source and/or sink for map reduce jobs.

+

Thrift Proxy (1.5 & 1.4.4)

+

The Accumulo client code contains a lot of complexity. For example, the +client code locates tablets, retries in the case of failures, and supports +concurrent reading and writing. All of this is written in Java. The thrift +proxy wraps the Accumulo client API with thrift, making this API easily +available to other languages like Python, Ruby, C++, etc.

Extensible Behaviors

Pluggable balancer

Users can provide a balancer plugin that decides how to distribute tablets @@ -318,13 +338,18 @@ even if major compactions were falling b was growing. Without this feature, ingest performance can roughly continue at a constant rate, even as scan performance decreases because tablets have too many files.

+

Loading jars using VFS (1.5)

+

User written iterators are a useful way to manipulate data in data in Accumulo.
+Before 1.5., users had to copy their iterators to each tablet server. Starting +with 1.5 Accumulo can load iterators from HDFS using Apache commons VFS.

On-demand Data Management

Compactions

Ability to force tablets to compact to one file. Even tablets with one file are compacted. This is useful for improving query performance, permanently applying iterators, or using a new locality group configuration. One example of using iterators is applying a filtering iterator to remove data from a -table.

+table. As of 1.5, users can initiate a compaction with iterators only applied to +that compaction event.

Split points

Arbitrary split points can be added to an online table at any point in time. This is useful for increasing ingest performance on a new table. It can also be @@ -338,14 +363,15 @@ data and copies its configuration. A clo mutated independently. Testing was the motivating reason behind this new feature. For example to test a new filtering iterator, clone the table, add the filter to the clone, and force a major compaction.

+

Import/Export Table (1.5)

+

An offline tables metadata and files can easily be copied to another cluster and +imported.

Compact Range (1.4)

-

Compact each tablet that falls within a row range down to a single file.
-

+

Compact each tablet that falls within a row range down to a single file.

Delete Range (1.4)

Added an operation to efficiently delete a range of rows from a table. Tablets that fall completely within a range are simply dropped. Tablets overlapping the -beginning and end of the range are split, compacted, and then merged.
-

+beginning and end of the range are split, compacted, and then merged.