lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject TokenFilters eating position increments
Date Thu, 22 Sep 2005 20:29:31 GMT
Yonik identified an interesting issue with LUCENE-437 - http:// 
issues.apache.org/jira/browse/LUCENE-437

I patched the SnowballFilter, but then looked at other filters and we  
have the same issue with some of them (like StandardFilter,  
GermanStemFilter, GreekLowerCaseFilter, and others that create a new  
Token).

To perhaps alleviate this situation in the future, maybe we should  
add another constructor to Token:

     public Token(String text, int start, int end, String typ, int  
positionIncrement)

Or maybe one that clones an existing token:

     public Token(Token template, String newText)

where all the metadata for the token (start, end, type, and position  
increment) is copied and the newText is used for the Token text  
instead.  Filters don't generally change offsets, type, or position  
increments anyway - the majority change the text for stemming or  
lowercasing purposes.

Thoughts?

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message