lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: whats the correct way to do normalisation?
Date Mon, 06 Nov 2006 20:48:43 GMT

On Nov 6, 2006, at 11:27 AM, hans meiser wrote:

> Hi,
>
>> Did you take a look at IsoLatin1AccentFilter ?
>
>   It nearly do the same i need, but not perfectly.
>
>    public final Token next() throws java.io.IOException {
>  final Token t = input.next();
>    if (t == null)
>    return null;
>  return new Token(removeAccents(t.termText()), t.startOffset(),  
> t.endOffset(), t.type());
>  }
>
>   Here also a new Token is created. The question i have, why the  
> endoffset is not
>   corrected for the new created token? Some times the new token is  
> bigger than before.
>   Complete code link:
>   http://developer.spikesource.com/spikewatch.logs/fedora-3- 
> i386/2221/lucene/reports/clover/org/apache/lucene/analysis/ 
> ISOLatin1AccentFilter.html

For highlighting purposes, it's best to keep the offsets in the  
original text, not adjusted for token mutation.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message