lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: TokenFilters eating position increments
Date Thu, 22 Sep 2005 20:56:30 GMT
Actually, to reply to myself, the filters that are simply changing  
the term text shouldn't be creating a new term anyway - but rather  
just setting term.termText = ... on the original term.  I'll see  
about modifying our core and contrib filters to do this.

     Erik

On Sep 22, 2005, at 4:29 PM, Erik Hatcher wrote:

> Yonik identified an interesting issue with LUCENE-437 - http:// 
> issues.apache.org/jira/browse/LUCENE-437
>
> I patched the SnowballFilter, but then looked at other filters and  
> we have the same issue with some of them (like StandardFilter,  
> GermanStemFilter, GreekLowerCaseFilter, and others that create a  
> new Token).
>
> To perhaps alleviate this situation in the future, maybe we should  
> add another constructor to Token:
>
>     public Token(String text, int start, int end, String typ, int  
> positionIncrement)
>
> Or maybe one that clones an existing token:
>
>     public Token(Token template, String newText)
>
> where all the metadata for the token (start, end, type, and  
> position increment) is copied and the newText is used for the Token  
> text instead.  Filters don't generally change offsets, type, or  
> position increments anyway - the majority change the text for  
> stemming or lowercasing purposes.
>
> Thoughts?
>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message