Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 41359 invoked from network); 3 Sep 2009 16:02:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Sep 2009 16:02:05 -0000 Received: (qmail 88401 invoked by uid 500); 3 Sep 2009 16:02:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88376 invoked by uid 500); 3 Sep 2009 16:02:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88366 invoked by uid 99); 3 Sep 2009 16:02:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 16:02:01 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [132.204.246.22] (HELO pruche.dit.umontreal.ca) (132.204.246.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 16:01:52 +0000 Received: from gw-mail.lexum.umontreal.ca (gw-mail.lexum.umontreal.ca [132.204.136.52]) by pruche.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id n83G1TYP002166 for ; Thu, 3 Sep 2009 12:01:29 -0400 Received: from [192.168.4.150] (siracusa.lan.lexum.pri [::ffff:192.168.4.150]) (AUTH: PLAIN shaned, SSL: TLSv1/SSLv3,256bits,AES256-SHA) by gw-mail.lexum.umontreal.ca with esmtp; Thu, 03 Sep 2009 12:01:29 -0400 id 00027341.4A9FE859.00000635 Message-ID: <4A9FE859.40404@lexum.umontreal.ca> Date: Thu, 03 Sep 2009 12:01:29 -0400 From: Daniel Shane User-Agent: Thunderbird 2.0.0.19 (X11/20081216) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter. References: <5B20DEF02611534DB08854076CE825D8032DB19B@sc1exc2.corp.emainc.com> <4A9D99C7.3080400@lexum.umontreal.ca> <4A9D9C48.7040703@lexum.umontreal.ca> <4A9DB1C4.3000001@gmail.com> <4A9DB3EA.7000800@gmail.com> <4A9FE6E1.1060108@lexum.umontreal.ca> In-Reply-To: <4A9FE6E1.1060108@lexum.umontreal.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV3355=0 X-Virus-Checked: Checked by ClamAV on apache.org Ok, I got it, from checking other filters, I should call input.incrementToken() instead of super.incrementToken(). Do you feel this kind of breaks the object model (super.incrementToken() should also work). Maybe when the old API is gone, we can stop checking if someone has overloaded next() or incrementToken()? Daniel S. > Humm... I looked at captureState() and restoreState() and it doesnt > seem like it would work in my scenario. > > I'd like the LookAheadFilter to be able to peek() several tokens > forward and they can have different attributes, so I don't think I > should assume I can restoreState() safely. > > Here is an application for the filter, lets say I want to recognize > abbreviations (like S.C.R.) at the token level. I'd need to be able to > peek() a few tokens forward to make sure S.C.R. is an abbreviation and > not simply the end of a sentence. > > So the user should be able to peek() a number of token forward before > returning to usual behavior. > > Here is the implementation I had in mind (untested yet because of a > StackOverflow) : > > public class LookaheadTokenFilter extends TokenFilter { > /** List of tokens that were peeked but not returned with next. */ > LinkedList peekedTokens = new > LinkedList(); > > /** The position of the next character that peek() will return in > peekedTokens */ > int peekPosition = 0; > > public LookaheadTokenFilter(TokenStream input) { > super(input); > } > > public boolean peekIncrementToken() throws IOException { > if (this.peekPosition >= this.peekedTokens.size()) { > if (this.input.incrementToken() == false) { > return false; > } > > this.peekedTokens.add(cloneAttributes()); > this.peekPosition = this.peekedTokens.size(); > return true; > } > this.peekPosition++; return true; > } > @Override > public boolean incrementToken() throws IOException { > reset(); > if (this.peekedTokens.isEmpty() == false) { > this.peekedTokens.removeFirst(); > } > if (this.peekedTokens.isEmpty() == false) { > return true; > } > return super.incrementToken(); > } > @Override > public void reset() { > this.peekPosition = 0; > } > //Overloaded methods... > public Attribute getAttribute(Class attClass) { > if (this.peekedTokens.size() > 0) { > return > this.peekedTokens.get(this.peekPosition).getAttribute(attClass); > } return super.getAttribute(attClass); > } > //Overload all these just like getAttribute() ... > public Iterator getAttributeClassesIterator() ... > public AttributeFactory getAttributeFactory() ... > public Iterator getAttributeImplsIterator() ... > public Attribute addAttribute(Class attClass) ... > public void addAttributeImpl(AttributeImpl att) ... > public State captureState() ... > public void clearAttributes() ... > public AttributeSource cloneAttributes() ... > public boolean hasAttribute(Class attClass) ... > public boolean hasAttributes() ... > public void restoreState(State state) ... } > > > Now the problem I have is that the below code triggers an evil > StackOverflow because I'm overriding incrementToken() and calling > super.incrementToken() which will loop back because of this : > > public boolean incrementToken() throws IOException { > assert tokenWrapper != null; > final Token token; > if (supportedMethods.hasReusableNext) { > token = next(tokenWrapper.delegate); > } else { > assert supportedMethods.hasNext; > token = next(); <----- Lucene calls next(); > } > if (token == null) return false; > tokenWrapper.delegate = token; > return true; > } > > which then calls : > > public Token next() throws IOException { > if (tokenWrapper == null) > throw new UnsupportedOperationException("This TokenStream only > supports the new Attributes API."); > if (supportedMethods.hasIncrementToken) { > return incrementToken() ? ((Token) tokenWrapper.delegate.clone()) > : null; <--- incrementToken() gets called > } else { > assert supportedMethods.hasReusableNext; > final Token token = next(tokenWrapper.delegate); > if (token == null) return null; > tokenWrapper.delegate = token; > return (Token) token.clone(); > } > } > > and hasIncrementToken is true because I overloaded incrementToken(); > > MethodSupport(Class clazz) { > hasIncrementToken = isMethodOverridden(clazz, "incrementToken", > METHOD_NO_PARAMS); > hasReusableNext = isMethodOverridden(clazz, "next", > METHOD_TOKEN_PARAM); > hasNext = isMethodOverridden(clazz, "next", METHOD_NO_PARAMS); > } > > Seems like a "catch-22". From what I understand, if I override > incrementToken() I should not call super.incrementToken()???? > > Daniel S. > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org