accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3959) Confusing wording on BatchScanner javadoc
Date Mon, 24 Aug 2015 18:32:46 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709810#comment-14709810
] 

ASF GitHub Bot commented on ACCUMULO-3959:
------------------------------------------

Github user dhutchis commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/45#discussion_r37785744
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java ---
    @@ -16,19 +16,20 @@
      */
     package org.apache.accumulo.core.client;
     
    +import org.apache.accumulo.core.data.Range;
    +
     import java.util.Collection;
     import java.util.concurrent.TimeUnit;
     
    -import org.apache.accumulo.core.data.Range;
    -
     /**
      * Implementations of BatchScanner support efficient lookups of many ranges in accumulo.
    + * BatchScanners are also appropriate for large, single ranges,
    + * as a BatchScanner will break those ranges up into separate RPCs
    + * provided the range spans more than one tablet
    + * and there are sufficiently many scan threads available.
      *
    - * Use this when looking up lots of ranges and you expect each range to contain a small
amount of data. Also only use this when you do not care about the
    - * returned data being in sorted order.
    - *
    - * If you want to lookup a few ranges and expect those ranges to contain a lot of data,
then use the Scanner instead. Also, the Scanner will return data in
    - * sorted order, this will not.
    + * Only use this when you do not care about returned data being in sorted order.
    --- End diff --
    
    Correct, I see that the <p> tag is necessary from the online javadoc at
    http://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html
    
    Will fix tonight when I return to my laptop.  I don't think my editor
    (IntelliJ with the Eclipse code formatter plugin) adds the HTML tags
    automatically.
    
    On Mon, Aug 24, 2015 at 2:25 PM, Keith Turner <notifications@github.com>
    wrote:
    
    > In core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java
    > <https://github.com/apache/accumulo/pull/45#discussion_r37784571>:
    >
    > >   *
    > > - * Use this when looking up lots of ranges and you expect each range to contain
a small amount of data. Also only use this when you do not care about the
    > > - * returned data being in sorted order.
    > > - *
    > > - * If you want to lookup a few ranges and expect those ranges to contain a
lot of data, then use the Scanner instead. Also, the Scanner will return data in
    > > - * sorted order, this will not.
    > > + * Only use this when you do not care about returned data being in sorted order.
    >
    > This was already broken before your patch, but I think javadoc need <p>
    > markup for paragraphs. Not sure it will render as intended w/o it.
    >
    > Did you format these changes?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/accumulo/pull/45/files#r37784571>.
    >



> Confusing wording on BatchScanner javadoc
> -----------------------------------------
>
>                 Key: ACCUMULO-3959
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: docs
>    Affects Versions: 1.6.3, 1.7.0
>            Reporter: Dylan Hutchison
>            Assignee: Dylan Hutchison
>            Priority: Minor
>              Labels: docuentation
>             Fix For: 1.6.4, 1.7.1
>
>
> The following sentence in the [BatchScanner Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
has confused my colleagues into using Scanners and wondering why performance doesn't scale.
> bq. If you want to lookup a few ranges and expect those ranges to contain a lot of data,
then use the Scanner instead.
> Also regarding this next sentence, from what I see of the BatchScanner it will break
up "large Range objects" that span multiple extents (tablets) into multiple ranges, possibly
one for each tablet.
> bq. Use this when looking up lots of ranges and you expect each range to contain a small
amount of data.
> If the client is okay with unsorted order and it is okay with using multiple threads,
then isn't it always a better decision to use a BatchScanner than regular Scanner?  In the
worst case, one Range over a single row, the BatchScanner will perform the same as a regular
Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message