Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Wed, 11 Dec 2013 15:39:07 +0000 (UTC)
From: "Chris McCubbin (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12537404.1325793528318.18056.1386776347220@arcas>
In-Reply-To: <JIRA.12537404.1325793528318@arcas>
References: <JIRA.12537404.1325793528318@arcas>
Subject: [jira] [Commented] (ACCUMULO-261) Scanner should support batch size
 specified in bytes
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/ACCUMULO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845480#comment-13845480 ] 

Chris McCubbin commented on ACCUMULO-261:
-----------------------------------------

I'm encountering the need for this setting yet again. The situation is that I have an iterator stack that has a high cost to re-seek. Sometimes I want all the results ("bulk") sometimes I only want a few ("top-k"). There really is no good "one size fits all" table.scan.max.memory setting in this case. If I set it small, the re-seek overhead kills performance on the bulk scan. If I set it large I look-ahead way too many entries for the top-k use-case and performance is again poor. 

Also related is the fact that one can only "setBatchSize" on Scanners and not BatchScanners.

> Scanner should support batch size specified in bytes
> ----------------------------------------------------
>
>                 Key: ACCUMULO-261
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-261
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
>
> Currently the scanner allows a user to set batch size in numbers of entries. Unfortunately this isn't too useful if you have widely varied entry size and you want to keep your internal footprint within a threshold. So we should also allow users to set batch size in maximum number of bytes to bring back.


--
This message was sent by Atlassian JIRA
(v6.1.4#6159)