lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.
Date Tue, 01 Sep 2009 23:53:14 GMT
This is what I had in mind (completely untested!):

public class LookaheadTokenFilter extends TokenFilter {
   /** List of tokens that were peeked but not returned with next. */
   LinkedList<AttributeSource.State> peekedTokens = new 
LinkedList<AttributeSource.State>();

   /** The position of the next character that peek() will return in 
peekedTokens */
   int peekPosition = 0;

   public LookaheadTokenFilter(TokenStream input) {
       super(input);
   }
     public boolean peek() throws IOException {
       if (this.peekPosition >= this.peekedTokens.size()) {
           boolean hasNext = input.incrementToken();
           if (hasNext) {
               this.peekedTokens.add(captureState());
               this.peekPosition = this.peekedTokens.size();
           }
           return hasNext;
       }

       restoreState(this.peekedTokens.get(this.peekPosition++));
       return true;
   }

   public void reset() { this.peekPosition = 0; }

   public boolean incrementToken() throws IOException {
     reset();

     if (this.peekedTokens.size() > 0) {
       restoreState(this.peekedTokens.removeFirst());
       return true;
     }
     return this.input.incrementToken();
   }
}


On 9/1/09 4:44 PM, Michael Busch wrote:
> Daniel,
>
> take a look at the captureState() and restoreState() APIs in 
> AttributeSource and TokenStream. captureState() returns a State object 
> containing all attributes with its' current values. 
> restoreState(State) takes a given State and copies its values back 
> into the TokenStream. You should be able to achieve the same thing by 
> storing State objects in your List, instead of Token objects. peek() 
> would change to return true/false instead of Token and the caller of 
> peek consumes the values using the new attribute API. The change on 
> your side should be pretty simple, let us know if you run into problems!
>
>  Michael
>
> On 9/1/09 3:12 PM, Daniel Shane wrote:
>> After thinking about it, the only conclusion I got was instead of 
>> saving the token, to save an iterator of Attributes and use that 
>> instead. It may work.
>>
>> Daniel Shane
>>
>> Daniel Shane wrote:
>>> Hi all!
>>>
>>> I'm trying to port my Lucene code to the new TokenStream API and I 
>>> have a filter that I cannot seem to port using the current new API.
>>>
>>> The filter is called LookaheadTokenFilter. It behaves exactly like a 
>>> normal token filter, except, you can call peek() and get information 
>>> on the next token in the stream.
>>>
>>> Since Lucene does not support stream "rewinding", we did this by 
>>> buffering tokens when peek() was called and giving those back when 
>>> next() was called and when no more "peeked" tokens exist, we then 
>>> call super.next();
>>>
>>> Now, I'm looking at this new API and really I'm stuck at how to port 
>>> this using incrementToken...
>>>
>>> Am I missing something, is there an object I can get from the 
>>> TokenStream that I can save and get all the attributes from?
>>>
>>> Here is the code I'm trying to port :
>>>
>>> public class LookaheadTokenFilter extends TokenFilter {
>>>    /** List of tokens that were peeked but not returned with next. */
>>>    LinkedList<Token> peekedTokens = new LinkedList<Token>();
>>>
>>>    /** The position of the next character that peek() will return in 
>>> peekedTokens */
>>>    int peekPosition = 0;
>>>
>>>    public LookaheadTokenFilter(TokenStream input) {
>>>        super(input);
>>>    }
>>>      public Token peek() throws IOException {
>>>        if (this.peekPosition >= this.peekedTokens.size()) {
>>>            Token token = new Token();
>>>            token = this.input.next(token);
>>>            if (token != null) {
>>>                this.peekedTokens.add(token);
>>>                this.peekPosition = this.peekedTokens.size();
>>>            }
>>>            return token;
>>>        }
>>>
>>>        return this.peekedTokens.get(this.peekPosition++);
>>>    }
>>>
>>>    public void reset() { this.peekPosition = 0; }
>>>
>>>    public Token next(Token token) throws IOException {
>>>        reset();
>>>
>>>        if (this.peekedTokens.size() > 0) {
>>>            return this.peekedTokens.removeFirst();
>>>        }
>>>                  return this.input.next(token);          }
>>> }
>>>
>>> Let me know if anyone has an idea,
>>> Daniel Shane
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message