lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject positional token info
Date Tue, 21 Oct 2003 01:15:56 GMT
Is anyone doing anything interesting with the 
Token.setPositionIncrement during analysis?

Just for fun, I've written a simple stop filter that bumps the position 
increments to account for the stop words removed:

   public final Token next() throws IOException {
     int increment = 0;
     for (Token token = input.next(); token != null; token = 
input.next()) {

       if (table.get(token.termText()) == null) {
         token.setPositionIncrement(token.getPositionIncrement() + 
increment);
         return token;
       }

       increment++;
     }

     return null;
   }


But its practically impossible to formulate a Query that can take 
advantage of this.  A PhraseQuery, because Terms don't have positional 
info (only the transient tokens), only works using a slop factor which 
doesn't guarantee an exact match like I'm after.  A PhrasePrefixQuery 
won't work any better as there is no way to add in a "blank" term to 
indicate a missing position.

I certainly see the benefit of putting tokens into zero-increment 
positions, but are increments of 2 or more at all useful?  If so, how 
are folks using it?

Thanks,
	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message