lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Shane <sha...@LEXUM.UMontreal.CA>
Subject Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.
Date Tue, 01 Sep 2009 22:01:43 GMT
Hi all!

I'm trying to port my Lucene code to the new TokenStream API and I have 
a filter that I cannot seem to port using the current new API.

The filter is called LookaheadTokenFilter. It behaves exactly like a 
normal token filter, except, you can call peek() and get information on 
the next token in the stream.

Since Lucene does not support stream "rewinding", we did this by 
buffering tokens when peek() was called and giving those back when 
next() was called and when no more "peeked" tokens exist, we then call;

Now, I'm looking at this new API and really I'm stuck at how to port 
this using incrementToken...

Am I missing something, is there an object I can get from the 
TokenStream that I can save and get all the attributes from?

Here is the code I'm trying to port :

public class LookaheadTokenFilter extends TokenFilter {
    /** List of tokens that were peeked but not returned with next. */
    LinkedList<Token> peekedTokens = new LinkedList<Token>();

    /** The position of the next character that peek() will return in 
peekedTokens */
    int peekPosition = 0;

    public LookaheadTokenFilter(TokenStream input) {
    public Token peek() throws IOException {
        if (this.peekPosition >= this.peekedTokens.size()) {
            Token token = new Token();
            token =;
            if (token != null) {
                this.peekPosition = this.peekedTokens.size();
            return token;

        return this.peekedTokens.get(this.peekPosition++);
    public void reset() { this.peekPosition = 0; }

    public Token next(Token token) throws IOException {

        if (this.peekedTokens.size() > 0) {
            return this.peekedTokens.removeFirst();

Let me know if anyone has an idea,
Daniel Shane

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message