accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From keith-turner <...@git.apache.org>
Subject [GitHub] accumulo pull request: ACCUMULO-3602 BatchScanner optimization for...
Date Wed, 08 Apr 2015 23:42:25 GMT
Github user keith-turner commented on the pull request:

    https://github.com/apache/accumulo/pull/25#issuecomment-91069467
  
    I was discussing the use case I mentioned offline w/ @ctubbsii.  This use case was a large
number of ranges that can not be generated by a function.  We determined that function could
handle this case well by storing the ranges somewhere else beside the job conf.  For example
could do the following.
    
     * Store 10,000,000 sorted ranges in file in distributed cache (assume thousands of tablets)
     * Using the provided function, each mapper opens the file and reads the ranges for the
tablet its working on.
     * The ranges returned by the function are used to initialize the batch scanner for each
mapper.
    
    It seems like all of the use cases that the current implementation satisfies could also
be satisfied with the functor implementation.  
    
    If this PR does not follow the functor approach discussed, thats ok w/ me.   I can open
up a follow on issue to record the discussion if its not pursued here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message