accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris McCubbin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-261) Scanner should support batch size specified in bytes
Date Wed, 11 Dec 2013 15:39:07 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845480#comment-13845480
] 

Chris McCubbin commented on ACCUMULO-261:
-----------------------------------------

I'm encountering the need for this setting yet again. The situation is that I have an iterator
stack that has a high cost to re-seek. Sometimes I want all the results ("bulk") sometimes
I only want a few ("top-k"). There really is no good "one size fits all" table.scan.max.memory
setting in this case. If I set it small, the re-seek overhead kills performance on the bulk
scan. If I set it large I look-ahead way too many entries for the top-k use-case and performance
is again poor. 

Also related is the fact that one can only "setBatchSize" on Scanners and not BatchScanners.

> Scanner should support batch size specified in bytes
> ----------------------------------------------------
>
>                 Key: ACCUMULO-261
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-261
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
>
> Currently the scanner allows a user to set batch size in numbers of entries. Unfortunately
this isn't too useful if you have widely varied entry size and you want to keep your internal
footprint within a threshold. So we should also allow users to set batch size in maximum number
of bytes to bring back.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message