lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hans meiser <fischauto...@yahoo.de>
Subject Re: whats the correct way to do normalisation?
Date Mon, 06 Nov 2006 16:27:41 GMT
Hi,
   
  > Did you take a look at IsoLatin1AccentFilter ?
   
  It nearly do the same i need, but not perfectly.
   
   public final Token next() throws java.io.IOException {
 final Token t = input.next();
   if (t == null)
   return null;   
 return new Token(removeAccents(t.termText()), t.startOffset(), t.endOffset(), t.type());
 }
   
  Here also a new Token is created. The question i have, why the endoffset is not
  corrected for the new created token? Some times the new token is bigger than before.
  Complete code link:
  http://developer.spikesource.com/spikewatch.logs/fedora-3-i386/2221/lucene/reports/clover/org/apache/lucene/analysis/ISOLatin1AccentFilter.html
  


 

 		
---------------------------------
Keine Lust auf Tippen? Rufen Sie Ihre Freunde einfach an.
  Yahoo! Messenger. Jetzt installieren . 
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message