Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1FCD8182D3 for ; Thu, 27 Aug 2015 16:06:11 +0000 (UTC) Received: (qmail 99765 invoked by uid 500); 27 Aug 2015 16:06:11 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 99660 invoked by uid 500); 27 Aug 2015 16:06:11 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 99437 invoked by uid 99); 27 Aug 2015 16:06:10 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Aug 2015 16:06:10 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id BCDB1E7E54; Thu, 27 Aug 2015 16:06:10 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: elserj@apache.org To: commits@accumulo.apache.org Date: Thu, 27 Aug 2015 16:06:15 -0000 Message-Id: In-Reply-To: <920078a9dc0e4ba49fde8971b7aabcbf@git.apache.org> References: <920078a9dc0e4ba49fde8971b7aabcbf@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [06/12] accumulo git commit: ACCUMULO-3959 Rewrite BatchWriter javadoc ACCUMULO-3959 Rewrite BatchWriter javadoc Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/d6427e1c Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/d6427e1c Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/d6427e1c Branch: refs/heads/master Commit: d6427e1ccd6cab7c9f40cc188d5258dbeb71c97c Parents: 01dd7e3 Author: Dylan Hutchison Authored: Mon Aug 24 19:01:23 2015 -0400 Committer: Dylan Hutchison Committed: Mon Aug 24 19:01:23 2015 -0400 ---------------------------------------------------------------------- .../accumulo/core/client/BatchScanner.java | 25 +++++++++++++++----- 1 file changed, 19 insertions(+), 6 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/d6427e1c/core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java ---------------------------------------------------------------------- diff --git a/core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java b/core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java index bd7eb88..af0fd85 100644 --- a/core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java +++ b/core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java @@ -22,15 +22,28 @@ import java.util.Collection; import java.util.concurrent.TimeUnit; /** - * Implementations of BatchScanner support efficient lookups of many ranges in accumulo. - * BatchScanners are also appropriate for large, single ranges, - * as a BatchScanner will break those ranges up into separate RPCs - * provided the range spans more than one tablet - * and there are sufficiently many scan threads available. + * In exchange for possibly returning scanned entries out of order, + * BatchScanner implementations may scan an Accumulo table more efficiently by + *
    + *
  • Looking up multiple ranges in parallel. + * Parallelism is constrained by the number of threads available to the BatchScanner, set in its constructor.
  • + *
  • Breaking up large ranges into subranges. + * Often the number and boundaries of subranges are determined by a table's split points.
  • + *
  • Combining multiple ranges into a single RPC call to a tablet server.
  • + *
* - * Only use this when you do not care about returned data being in sorted order. + * The above techniques lead to better performance than a {@link Scanner} in use cases such as + *
    + *
  • Retrieving many small ranges
  • + *
  • Scanning a large range that returns many entries
  • + *
  • Running server-side iterators that perform computation, + * even if few entries are returned from the scan itself
  • + *
+ * + * To re-emphasize, only use a BatchScanner when you do not care whether returned data is in sorted order. * Use a {@link Scanner} instead when sorted order is important. * + *

* A BatchScanner instance will use no more threads than provided in the construction of the BatchScanner * implementation. Multiple invocations of iterator() will all share the same resources of the instance. * A new BatchScanner instance should be created to use allocate additional threads.