Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 77805 invoked from network); 28 Aug 2009 17:51:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Aug 2009 17:51:41 -0000 Received: (qmail 71106 invoked by uid 500); 28 Aug 2009 17:51:41 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 71010 invoked by uid 500); 28 Aug 2009 17:51:40 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 71002 invoked by uid 99); 28 Aug 2009 17:51:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Aug 2009 17:51:40 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [68.233.172.4] (HELO exchange.blackducksoftware.com) (68.233.172.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Aug 2009 17:51:31 +0000 Received: from BDSEXCH2K7CLS.blackducksoftware.com ([10.9.8.115]) by exdirector.blackducksoftware.com ([10.9.8.114]) with mapi; Fri, 28 Aug 2009 13:51:05 -0400 From: David Kaelbling To: David Kaelbling , "java-dev@lucene.apache.org" Date: Fri, 28 Aug 2009 13:48:27 -0400 Subject: RE: CachingTokenFilter extensibility and LUCENE-1685 Thread-Topic: CachingTokenFilter extensibility and LUCENE-1685 Thread-Index: Acon5LDIDzqTW03JQqyKlRhLUSFCkAAAL2+wAAiT0Cw= Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Uwe, I kind of like the idea of changing WeightedSpanTermExtractor to test for=20 "!(tokenStream instanceof RandomAccess)" :-) - David -- David Kaelbling Senior Software Engineer Black Duck Software, Inc. dkaelbling@blackducksoftware.com T +1.781.810.2041 F +1.781.891.5145 http://www.blackducksoftware.com ________________________________________ From: David Kaelbling Sent: Friday, August 28, 2009 11:54 AM To: Uwe Schindler; java-dev@lucene.apache.org Subject: RE: CachingTokenFilter extensibility and LUCENE-1685 Hi Uwe, The problem is that I need to have a random access token stream for other r= easons, and don't want CachingTokenFilter to buffer up a redundant copy of = it. In existing releases I subclass it to override all the methods to use = my store, and ignore the LinkedList cache member. The old internal structu= res were still present, but were never used. In 2.9 I can't do that any mo= re, and without a subclassed object I have no way to prevent WeightedSpanTe= rmExtractor from wrapping the stream. If there were some way to tell WeightedSpanTermExtractor not wrap the strea= m (a new TokenStream.isCachingTokens() method, checking for an new "CachedT= okenStream" interface rather than for CachingTokenFilter, some attribute, a= nything! :-) then I could still work with the public API. - David -- David Kaelbling Senior Software Engineer Black Duck Software, Inc. dkaelbling@blackducksoftware.com T +1.781.810.2041 F +1.781.891.5145 http://www.blackducksoftware.com ________________________________________ From: Uwe Schindler [uwe@thetaphi.de] Sent: Friday, August 28, 2009 4:03 AM To: java-dev@lucene.apache.org Subject: RE: CachingTokenFilter extensibility and LUCENE-1685 Hi David, What is exactly your problem? Even the old 2.4 CachingTokenFilter did not expose its internal structures, so overriding would not change its internal implementation. The only change now is, that *all* TokenFilters in core hav= e final implementations, which is a consequence of the new TokenStream API an= d the migration path to it. So it should not be possible to override next()/next(Token)/incrementToken() in all TokenStreams, as extensibility o= f the whole API is because of simply adding new TokenFilters into the chain, that do what you want to add. Let users override incrementToken() would possibly break a lot of things (see LUCENE-1753) To fix your specific problems, it may be an idea to add a method (isCachingTokens) in future to TokenStreams that default to false and is true for CachingTokenFilter and TeeSinkTokenStream.SinkTokenStream. Highlighter would be able to detect, if it can reset() (better name would b= e rewind) the TokenStream. In this case you could simply provide another TokenFilter subclass with isCachingTokens=3Dtrue and random access to the AttributeSource.States. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: David Kaelbling [mailto:dkaelbling@blackducksoftware.com] > Sent: Thursday, August 27, 2009 10:40 PM > To: java-dev@lucene.apache.org > Subject: CachingTokenFilter extensibility and LUCENE-1685 > > Hi, > > Looking at Lucene 2.9 trunk, CachingTokenFilter seems much less extensibl= e > than before. In previous releases I subclassed it so I could back the > cache with an array and provide random access to the stream. I can't see > how to do this any more, and the > WeightedSpanTermExtractor.getReaderForField() is still hardwired to > require a CachingTokenFilter-derived object. > > Am I missing something? Having two copies of the token stream, one for > random access and one hidden inside the CachingTokenFilter, does not soun= d > efficient :-) > > Thanks, > David > > -- > David Kaelbling > Senior Software Engineer > Black Duck Software, Inc. > > dkaelbling@blackducksoftware.com > T +1.781.810.2041 > F +1.781.891.5145 > > http://www.blackducksoftware.com > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org