lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: CachingTokenFilter extensibility and LUCENE-1685
Date Fri, 28 Aug 2009 18:26:32 GMT
In the longer term, I think we do something that is more automatic and
correct - but for now, adding this brute force option is best I think.

David Kaelbling wrote:
> Uwe,
>
> I kind of like the idea of changing WeightedSpanTermExtractor to test for 
> "!(tokenStream instanceof RandomAccess)" :-)
>
> - David
>
> --
> David Kaelbling
> Senior Software Engineer
> Black Duck Software, Inc.
>
> dkaelbling@blackducksoftware.com
> T +1.781.810.2041
> F +1.781.891.5145
>
> http://www.blackducksoftware.com
> ________________________________________
> From: David Kaelbling
> Sent: Friday, August 28, 2009 11:54 AM
> To: Uwe Schindler; java-dev@lucene.apache.org
> Subject: RE: CachingTokenFilter extensibility and LUCENE-1685
>
> Hi Uwe,
>
> The problem is that I need to have a random access token stream for other reasons, and
don't want CachingTokenFilter to buffer up a redundant copy of it.  In existing releases I
subclass it to override all the methods to use my store, and ignore the LinkedList cache member.
 The old internal structures were still present, but were never used.  In 2.9 I can't do that
any more, and without a subclassed object I have no way to prevent WeightedSpanTermExtractor
from wrapping the stream.
>
> If there were some way to tell WeightedSpanTermExtractor not wrap the stream (a new TokenStream.isCachingTokens()
method, checking for an new "CachedTokenStream" interface rather than for CachingTokenFilter,
some attribute, anything! :-) then I could still work with the public API.
>
>   - David
>
> --
> David Kaelbling
> Senior Software Engineer
> Black Duck Software, Inc.
>
> dkaelbling@blackducksoftware.com
> T +1.781.810.2041
> F +1.781.891.5145
>
> http://www.blackducksoftware.com
> ________________________________________
> From: Uwe Schindler [uwe@thetaphi.de]
> Sent: Friday, August 28, 2009 4:03 AM
> To: java-dev@lucene.apache.org
> Subject: RE: CachingTokenFilter extensibility and LUCENE-1685
>
> Hi David,
>
> What is exactly your problem? Even the old 2.4 CachingTokenFilter did not
> expose its internal structures, so overriding would not change its internal
> implementation. The only change now is, that *all* TokenFilters in core have
> final implementations, which is a consequence of the new TokenStream API and
> the migration path to it. So it should not be possible to override
> next()/next(Token)/incrementToken() in all TokenStreams, as extensibility of
> the whole API is because of simply adding new TokenFilters into the chain,
> that do what you want to add. Let users override incrementToken() would
> possibly break a lot of things (see LUCENE-1753)
>
> To fix your specific problems, it may be an idea to add a method
> (isCachingTokens) in future to TokenStreams that default to false and is
> true for CachingTokenFilter and TeeSinkTokenStream.SinkTokenStream.
> Highlighter would be able to detect, if it can reset() (better name would be
> rewind) the TokenStream. In this case you could simply provide another
> TokenFilter subclass with isCachingTokens=true and random access to the
> AttributeSource.States.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: David Kaelbling [mailto:dkaelbling@blackducksoftware.com]
>> Sent: Thursday, August 27, 2009 10:40 PM
>> To: java-dev@lucene.apache.org
>> Subject: CachingTokenFilter extensibility and LUCENE-1685
>>
>> Hi,
>>
>> Looking at Lucene 2.9 trunk, CachingTokenFilter seems much less extensible
>> than before.  In previous releases I subclassed it so I could back the
>> cache with an array and provide random access to the stream.  I can't see
>> how to do this any more, and the
>> WeightedSpanTermExtractor.getReaderForField() is still hardwired to
>> require a CachingTokenFilter-derived object.
>>
>> Am I missing something?  Having two copies of the token stream, one for
>> random access and one hidden inside the CachingTokenFilter, does not sound
>> efficient :-)
>>
>>   Thanks,
>>   David
>>
>> --
>> David Kaelbling
>> Senior Software Engineer
>> Black Duck Software, Inc.
>>
>> dkaelbling@blackducksoftware.com
>> T +1.781.810.2041
>> F +1.781.891.5145
>>
>> http://www.blackducksoftware.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message