accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ctubbsii <...@git.apache.org>
Subject [GitHub] accumulo pull request: ACCUMULO-3602 BatchScanner optimization for...
Date Wed, 08 Apr 2015 21:08:16 GMT
Github user ctubbsii commented on the pull request:

    https://github.com/apache/accumulo/pull/25#issuecomment-91037784
  
    In JIRA, I [mentioned][1] "sometimes it's better to query a larger range and let an iterator
filter out non-matching results".
    
    I think the createRanges method @keith-turner  describes could work if the function is
executed in the RecordReader (it also simplifies this issue significantly, because you wouldn't
need to create a new InputSplit type, but simply add an option to the AccumuloInputFormat).
There's still some risk of memory exhaustion with a large number of ranges within a tablet
(especially if the ranges were an exhaustive set of row-records to retrieve).
    
    However, I still think that for many things, it's probably better to simply use an iterator
with some filter criteria. It could be a SkippingIterator that seeks to ranges which are pre-configured
on that iterator, or it could be a Filter which has some filter criteria.
    
    [1]: https://issues.apache.org/jira/browse/ACCUMULO-3602?focusedCommentId=14327767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14327767


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message