lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.
Date Wed, 02 Sep 2009 06:23:12 GMT
There may be a problem that you may not want to restore the peek token into
the TokenFilter's attributes itsself. It looks like you want to have a Token
instance returned from peek, but the current Stream should not reset to this
Token (you only want to "look" into the next Token and then possibly do
something special with the current Token). To achive this, there is a method
cloneAttributes() in TokenStream, that creates a new AttributeSource with
same attribute types, which is independent from the cloned one. You can then
use clone.getAttribute(TermAttribute.class).term() or similar to look into
the next token. But creating this new clone is costy, so you may also create
it once and reuse. In the peek method, you simply copy the state of this to
the cloned attributesource.

It's a bit complicated but should work somehow. Tell me if you need more
help. Maybe you should provide us with some code, what you want to do with
the TokenFilter.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael Busch [mailto:buschmic@gmail.com]
> Sent: Wednesday, September 02, 2009 1:53 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken
> / AttributeSource), cannot implement a LookaheadTokenFilter.
> 
> This is what I had in mind (completely untested!):
> 
> public class LookaheadTokenFilter extends TokenFilter {
>    /** List of tokens that were peeked but not returned with next. */
>    LinkedList<AttributeSource.State> peekedTokens = new
> LinkedList<AttributeSource.State>();
> 
>    /** The position of the next character that peek() will return in
> peekedTokens */
>    int peekPosition = 0;
> 
>    public LookaheadTokenFilter(TokenStream input) {
>        super(input);
>    }
>      public boolean peek() throws IOException {
>        if (this.peekPosition >= this.peekedTokens.size()) {
>            boolean hasNext = input.incrementToken();
>            if (hasNext) {
>                this.peekedTokens.add(captureState());
>                this.peekPosition = this.peekedTokens.size();
>            }
>            return hasNext;
>        }
> 
>        restoreState(this.peekedTokens.get(this.peekPosition++));
>        return true;
>    }
> 
>    public void reset() { this.peekPosition = 0; }
> 
>    public boolean incrementToken() throws IOException {
>      reset();
> 
>      if (this.peekedTokens.size() > 0) {
>        restoreState(this.peekedTokens.removeFirst());
>        return true;
>      }
>      return this.input.incrementToken();
>    }
> }
> 
> 
> On 9/1/09 4:44 PM, Michael Busch wrote:
> > Daniel,
> >
> > take a look at the captureState() and restoreState() APIs in
> > AttributeSource and TokenStream. captureState() returns a State object
> > containing all attributes with its' current values.
> > restoreState(State) takes a given State and copies its values back
> > into the TokenStream. You should be able to achieve the same thing by
> > storing State objects in your List, instead of Token objects. peek()
> > would change to return true/false instead of Token and the caller of
> > peek consumes the values using the new attribute API. The change on
> > your side should be pretty simple, let us know if you run into problems!
> >
> >  Michael
> >
> > On 9/1/09 3:12 PM, Daniel Shane wrote:
> >> After thinking about it, the only conclusion I got was instead of
> >> saving the token, to save an iterator of Attributes and use that
> >> instead. It may work.
> >>
> >> Daniel Shane
> >>
> >> Daniel Shane wrote:
> >>> Hi all!
> >>>
> >>> I'm trying to port my Lucene code to the new TokenStream API and I
> >>> have a filter that I cannot seem to port using the current new API.
> >>>
> >>> The filter is called LookaheadTokenFilter. It behaves exactly like a
> >>> normal token filter, except, you can call peek() and get information
> >>> on the next token in the stream.
> >>>
> >>> Since Lucene does not support stream "rewinding", we did this by
> >>> buffering tokens when peek() was called and giving those back when
> >>> next() was called and when no more "peeked" tokens exist, we then
> >>> call super.next();
> >>>
> >>> Now, I'm looking at this new API and really I'm stuck at how to port
> >>> this using incrementToken...
> >>>
> >>> Am I missing something, is there an object I can get from the
> >>> TokenStream that I can save and get all the attributes from?
> >>>
> >>> Here is the code I'm trying to port :
> >>>
> >>> public class LookaheadTokenFilter extends TokenFilter {
> >>>    /** List of tokens that were peeked but not returned with next. */
> >>>    LinkedList<Token> peekedTokens = new LinkedList<Token>();
> >>>
> >>>    /** The position of the next character that peek() will return in
> >>> peekedTokens */
> >>>    int peekPosition = 0;
> >>>
> >>>    public LookaheadTokenFilter(TokenStream input) {
> >>>        super(input);
> >>>    }
> >>>      public Token peek() throws IOException {
> >>>        if (this.peekPosition >= this.peekedTokens.size()) {
> >>>            Token token = new Token();
> >>>            token = this.input.next(token);
> >>>            if (token != null) {
> >>>                this.peekedTokens.add(token);
> >>>                this.peekPosition = this.peekedTokens.size();
> >>>            }
> >>>            return token;
> >>>        }
> >>>
> >>>        return this.peekedTokens.get(this.peekPosition++);
> >>>    }
> >>>
> >>>    public void reset() { this.peekPosition = 0; }
> >>>
> >>>    public Token next(Token token) throws IOException {
> >>>        reset();
> >>>
> >>>        if (this.peekedTokens.size() > 0) {
> >>>            return this.peekedTokens.removeFirst();
> >>>        }
> >>>                  return this.input.next(token);          }
> >>> }
> >>>
> >>> Let me know if anyone has an idea,
> >>> Daniel Shane
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message