hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oliver Meyn (GBIF)" <om...@gbif.org>
Subject Re: resource usage of ResultScanner's Iterator<Result>
Date Fri, 02 Nov 2012 10:31:45 GMT

On 2012-10-26, at 9:59 PM, Stack wrote:

> On Thu, Oct 25, 2012 at 1:24 AM, Oliver Meyn (GBIF) <omeyn@gbif.org> wrote:
>> Hi all,
>> 
>> I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a
column value (e.g. give me all keys where column "dataset" = 1234).  That's straightforward
using a scan and filter.  The trick is that I want to return an Iterator over my key type
(Integer) rather than expose HBase internals (i.e. Result), so I need some kind of Decorator
that wraps the Iterator<Result>.  For every call to next() I'd then call the underlying
iterator's next() and extract my Integer key from the Result.  That all works fine, but what
I'm wondering is what resources the Iterator<Result> is holding, and how I can release
those from my decorator.
>> 
>> In my current implementation the decorator's constructor looks like:
>> 
>> public OccurrenceKeyIterator(HTablePool tablePool, String occurrenceTableName, Scan
scan)
>> 
>> and the constructor builds the ResultScanner and subsequent iterator.  In my hasNext()
method I can check the underlying iterator and if it says false I can shutdown my scanner
and return the table to the TablePool. But what if the end-user never reaches the end of the
Iterator, or just dereferences it? Am I at risk of leaking tables, connections or anything
else?  Any tips on what I should do?
>> 
> 
> If the close is not called, this is what will be missed on the HTable instance:
> 
> 
>    flushCommits();
>    if (cleanupPoolOnClose) {
>      this.pool.shutdown();
>    }
>    if (cleanupConnectionOnClose) {
>      if (this.connection != null) {
>        this.connection.close();
>      }
>    }
>    this.closed = true;
> 
> 
> In your case, the flushing of commits is of no import.
> 
> The pool above is an executor service inside of HTable used doing
> batch calls.  Again, you don't really use it but should probably get
> cleaned up.
> 
> The connection close is good because though all HTables share a
> Connection, the above close updates reference counters so we know when
> we can let go of the connection.
> 
> Keep a list of what you've given out and if unused in N minutes, close
> it yourself in background?

This kind of thing was all I could come up with but feels a bit messy.  It sounds like the
only real consequence of not closing nicely is that the reference counter doesn't get decremented,
meaning the Connection wouldn't get garbage collected if it were dereferenced.  Is that right?
 That doesn't sound too bad to me since the pool will be holding on to that connection anyway,
right? (Keeping in mind that the normal use case is everything gets cleaned when end-user
finishes iterating).

> (when you fellas going to upgrade?)

It's definitely in the plan, but keeps getting pushed down in favour of getting work done
:)  I read in the javadoc that the behaviour of tablepool and table close changes in newer
hbases - does my use case here change too (i.e. is it even less dangerous to leave a table
hanging in newer hbase)?

Thanks a lot for digging in to this Stack!

Oliver

--
Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12
http://www.gbif.org


Mime
View raw message