lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teruhiko Kurosaka <K...@basistech.com>
Subject RE: Where to free Tokenizer resources?
Date Thu, 12 Nov 2009 22:09:46 GMT
A month ago, I asked if Solr 1.4.0 (then in dvelopment)
calls close() on a TokenStream multiple times, because
I observed a behavor that was suggesting that.

There was no clear answer whether Solr is doing that, 
but there was no denial either. 

I'd like to revisit this issue again, because if Solr
is calling close() multiple times on the same TokenStream,
it seems to breaking the protocol described in
Lucene 2.9.1 TokenStream API doc which reads:

   1.  Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
   2. The consumer calls reset().
   3. The consumer retrieves attributes from the stream and stores local references to all
attributes it wants to access.
   4. The consumer calls incrementToken() until it returns false consuming the attributes
after each call.
   5. The consumer calls end() so that any end-of-stream operations can be performed.
   6. The consumer calls close() to release any resource when finished using the TokenStream.


It seems to mean that end() is to release per-stream resources,
and close() is to release per-TokenStream instance resources.
So close() should be called just once per TokenStream.

Am I reading this incorrectly?

-kuro  

> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf 
> Of Yonik Seeley
> Sent: Tuesday, October 20, 2009 9:37 AM
> To: solr-dev@lucene.apache.org
> Subject: Re: Where to free Tokenizer resources?
> 
> If you really want to release/acquire your resources each 
> time the tokenizer is used, then release it in the close() 
> and acquire in the reset().  There is no "done with this 
> forever" callback.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka 
> <Kuro@basistech.com> wrote:
> > Hi,
> > I have my own Tokenizer that was working with Solr 1.3 fine 
> but threw an Exception when used with Solr 1.4 dev.
> >
> > This Tokenizer uses some JNI-side resources that it takes 
> in the constructor and it frees it in close().
> >
> > The behavior seems to indicate that Solr 1.4 calls close() 
> then reset(Reader) in order to reuse the Tokenizer.  But my 
> Tokenizer threw an Exception because its resource has been 
> freed already. My temporary fix was to move the resource 
> release code from close() to finalize().  But I'm not very 
> happy with it because the timing of resource release is up to 
> the garbage collector.
> >
> > Question #1: Is close() supposed to be called more than 
> once? To me, 
> > close() should be called only once at the end of life cycle of the 
> > Tokenizer.  (The old reader shold be closed when reset(Reader) is 
> > called.)
> >
> > If the answer is Yes, then
> >
> > Question #2: Is there any better place to release the 
> internal resource than in finalize()?
> >
> > Thank you.
> >
> > T. "Kuro" Kurosaka
> >
> >
> 
Mime
View raw message