accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From keith-turner <>
Subject [GitHub] accumulo pull request: ACCUMULO-3602 BatchScanner optimization for...
Date Wed, 08 Apr 2015 20:52:20 GMT
Github user keith-turner commented on the pull request:
    I was discussing the big picture behind this PR w/ @ctubbsii .   It seems like this change
could encourage users to pass many ranges as configuration for the map reduce job.   This
could cause memory exhaustion for the job tracker.   
    We discussed passing a function which generates a set of ranges, instead of passing lots
of ranges.  The implementation would still use a batch scanner (or scanner with a special
iterator but its harder to pass code to tserver).   Each input split could call a function
like the following which deterministically creates a set of ranges.   Then those ranges could
be used for the batch scanner. 
    interface RangeGenerator {
       * @param tabletRange  The data range for the tablet over which the input split is executing
       * @param config a mysterious class that allows user to pass parameters to the function
      List<Range> createRanges(Range tabletRange, Myst config);
    When configuring the AccumuloInputFormat to use the batch scanner, a class name that implements
this function would be provided.   The ranges set on the job would be large ranges for portions
of table to process.  An input split would be created for each tablet that falls within those
large ranges, and for each input split the function would be called to possibly create many
more ranges.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message