accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris McCubbin (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-261) Scanner should support batch size specified in bytes
Date Wed, 11 Dec 2013 15:39:07 GMT


Chris McCubbin commented on ACCUMULO-261:

I'm encountering the need for this setting yet again. The situation is that I have an iterator
stack that has a high cost to re-seek. Sometimes I want all the results ("bulk") sometimes
I only want a few ("top-k"). There really is no good "one size fits all" table.scan.max.memory
setting in this case. If I set it small, the re-seek overhead kills performance on the bulk
scan. If I set it large I look-ahead way too many entries for the top-k use-case and performance
is again poor. 

Also related is the fact that one can only "setBatchSize" on Scanners and not BatchScanners.

> Scanner should support batch size specified in bytes
> ----------------------------------------------------
>                 Key: ACCUMULO-261
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
> Currently the scanner allows a user to set batch size in numbers of entries. Unfortunately
this isn't too useful if you have widely varied entry size and you want to keep your internal
footprint within a threshold. So we should also allow users to set batch size in maximum number
of bytes to bring back.

This message was sent by Atlassian JIRA

View raw message