hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: remove scanner caching?
Date Thu, 09 Apr 2015 20:30:47 GMT
Sounds like you and others are already ahead of me.  Thanks for
opening HBASE-13441 and your related work.  Some responses below:

> Nice idea! I agree that the Scan API would be cleaned up by your
> suggestions, especially the doc updates. Some comments below:
> > Scan.bufferSize (instead of maxResultSize for the target over-the-wire
> > size - though this is still confusing because it's common to go over this
> > size)
> Ya this setting will always have a little ambiguity associated with it (at
> least until such a time where we are able to enforce it at the byte level
> i.e. send back partial cells). Scan.bufferSize sounds okay. As a note,
> there was some discussion in HBASE-11544 about renaming this field and one
> of the recommendations was Scan.rpcChunkSize.

rpcChunkSize sounds fine to me too - much better than maxResultSize

> > Scan.limitRows (instead of caching - along with true client side support)
> Makes sense. I think that client side support is actually already there (at
> least it is in ClientScanner via the countdown variable that is used as the
> caching value for new scanner callables).

Gotcha - but I would envision the client actually closing the scanner
(Iterable<Result>) once the row limit is hit.  Changing the meaning
from something about how the data transfer is implemented to an actual
visible query limit.

> > Scan.allowPartialResults  (to indicate it's ok to break up rows across
> Results...)
> With HBASE-11544 in branch-1+ the server will stop adding Cells as soon as
> the buffer fills and send back the accumulated Results to the client (last
> Result may be a partial of its row). In the case that allow partial results
> is false, the ClientScanner handles reassembling the partials into a
> complete view of the row before releasing the Result to the application.

That's awesome.  Great work.

> With this proposed cleanup, are you recommending that we do away with
> Scan.setBatch? Would the default configuration remain as it is now in
> branch-1+ (rowLimit = Integer.MAX_VALUE, bufferSize = 2MB,
> allowPartialResults = false)?

Yes, I was thinking of dropping setBatch also.

View raw message