hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: InternalScanner next(..) methods
Date Sun, 09 Dec 2012 20:52:26 GMT
This method specifically only works when this is a heap of StoreScanners (i.e. on the RegionScanner
level), which is very confusing (to me anyway).
Maybe we should have two separate KeyValueHeap implementation to make it less confusing.

The list here comprises KVs for the same row key. These KVs need to be collected together
so that Filters can operate on entire rows.

I just looked at that code this week. We need to fix this stuff. :)

-- Lars

 From: Matt Corgan <mcorgan@hotpads.com>
To: dev <dev@hbase.apache.org> 
Sent: Saturday, December 8, 2012 11:27 PM
Subject: InternalScanner next(..) methods
I'm looking at the KeyValueHeap trying to see how we can make it work with
Cells.  I'm curious, in this method

  public boolean next(List<KeyValue> result, int limit, String metric)
throws IOException {
    if (this.current == null) {
      return false;
    InternalScanner currentAsInternal = (InternalScanner)this.current;
    boolean mayContainMoreRows = currentAsInternal.next(result, limit,

how is it getting multiple results from a single scanner without putting
the scanner back on the heap?  Couldn't that skip KeyValues?  Is it that
it's only used at the Region level where the family-per-file semantics
guarantee that all KeyValues in a single family will sort together?

My bigger question is regarding the next(List<KeyValue> result, int limit)
methods from the InternalScanner interface.  What's the reasoning for
getting multiple results in one call as opposed to calling the next()
method a bunch of times?  Buffering the KeyValues in a List like that means
the Cells would have to be expanded into full KeyValues which would be nice
to avoid.  Is there some logic that depends on getting a whole row of
values, even though you may only get a partial row due to the limit param?

Similarly, I see there is Filter.filterRow(List<KeyValue>) which looks like
it's barely used.  Is that an important method?  Doesn't look like it's
used much, but maybe people have custom Filters that need it.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message