lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: CachingTokenFilter extensibility and LUCENE-1685
Date Fri, 28 Aug 2009 16:05:50 GMT
bq. If there were some way to tell WeightedSpanTermExtractor not wrap the stream (a new TokenStream.isCachingTokens()
method, checking for an new "CachedTokenStream" interface rather than for CachingTokenFilter,
some attribute, anything! :-) then I could still work with the public API.

I didn't know someone would be out there doing that - the thought
crossed my mind, but I figured - eh, they will double up ;) Sorry.

I think TokenStream.isCachingTokens() is too invasive, but perhaps a
switch on QueryScorer that disables the wrapping - then if you have a
different caching stream, or you wanted to use a different resetable
stream for some reason, you could force the wrap off and take ownership
of making sure the thing is resetable.

Unfortunately, we are technically in feature freeze. I'd have to
classify the unnecessary wrap as a serious bug without some rule bending
to fix for 2.9 ...

-- 
- Mark

http://www.lucidimagination.com



David Kaelbling wrote:
> Hi Uwe,
>
> The problem is that I need to have a random access token stream for other reasons, and
don't want CachingTokenFilter to buffer up a redundant copy of it.  In existing releases I
subclass it to override all the methods to use my store, and ignore the LinkedList cache member.
 The old internal structures were still present, but were never used.  In 2.9 I can't do that
any more, and without a subclassed object I have no way to prevent WeightedSpanTermExtractor
from wrapping the stream.
>
> If there were some way to tell WeightedSpanTermExtractor not wrap the stream (a new TokenStream.isCachingTokens()
method, checking for an new "CachedTokenStream" interface rather than for CachingTokenFilter,
some attribute, anything! :-) then I could still work with the public API.
>
>   - David
>
> --
> David Kaelbling
> Senior Software Engineer
> Black Duck Software, Inc.
>
> dkaelbling@blackducksoftware.com
> T +1.781.810.2041
> F +1.781.891.5145
>
> http://www.blackducksoftware.com
> ________________________________________
> From: Uwe Schindler [uwe@thetaphi.de]
> Sent: Friday, August 28, 2009 4:03 AM
> To: java-dev@lucene.apache.org
> Subject: RE: CachingTokenFilter extensibility and LUCENE-1685
>
> Hi David,
>
> What is exactly your problem? Even the old 2.4 CachingTokenFilter did not
> expose its internal structures, so overriding would not change its internal
> implementation. The only change now is, that *all* TokenFilters in core have
> final implementations, which is a consequence of the new TokenStream API and
> the migration path to it. So it should not be possible to override
> next()/next(Token)/incrementToken() in all TokenStreams, as extensibility of
> the whole API is because of simply adding new TokenFilters into the chain,
> that do what you want to add. Let users override incrementToken() would
> possibly break a lot of things (see LUCENE-1753)
>
> To fix your specific problems, it may be an idea to add a method
> (isCachingTokens) in future to TokenStreams that default to false and is
> true for CachingTokenFilter and TeeSinkTokenStream.SinkTokenStream.
> Highlighter would be able to detect, if it can reset() (better name would be
> rewind) the TokenStream. In this case you could simply provide another
> TokenFilter subclass with isCachingTokens=true and random access to the
> AttributeSource.States.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: David Kaelbling [mailto:dkaelbling@blackducksoftware.com]
>> Sent: Thursday, August 27, 2009 10:40 PM
>> To: java-dev@lucene.apache.org
>> Subject: CachingTokenFilter extensibility and LUCENE-1685
>>
>> Hi,
>>
>> Looking at Lucene 2.9 trunk, CachingTokenFilter seems much less extensible
>> than before.  In previous releases I subclassed it so I could back the
>> cache with an array and provide random access to the stream.  I can't see
>> how to do this any more, and the
>> WeightedSpanTermExtractor.getReaderForField() is still hardwired to
>> require a CachingTokenFilter-derived object.
>>
>> Am I missing something?  Having two copies of the token stream, one for
>> random access and one hidden inside the CachingTokenFilter, does not sound
>> efficient :-)
>>
>>   Thanks,
>>   David
>>
>> --
>> David Kaelbling
>> Senior Software Engineer
>> Black Duck Software, Inc.
>>
>> dkaelbling@blackducksoftware.com
>> T +1.781.810.2041
>> F +1.781.891.5145
>>
>> http://www.blackducksoftware.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message