lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Goetzke" <uwe.goet...@healy-hudson.com>
Subject AW: Transforming german umlaute like ö,ä,ü,ß into oe, ae, ue, ss
Date Tue, 18 Nov 2008 12:25:52 GMT
Use ISOLatin1AccentFilter, although it is not perfect...
So I made ISOLatin2AccentFilter for me and changed this method.
We use our own analysers, so you would use something like this

		result = new org.apache.lucene.analysis.WhitespaceTokenizer(reader);
		result = new ISOLatin2AccentFilter(result);
		result = new org.apache.lucene.analysis.LowerCaseFilter(result);


* To replace accented characters in a String by unaccented equivalents.
	 */
	public final static String removeAccents(String input) {
		final StringBuffer output = new StringBuffer();
		for (int i = 0; i < input.length(); i++) {
			switch (input.charAt(i)) {
				case '\u00C0' : // À
				case '\u00C1' : // Á
				case '\u00C2' : // Â
				case '\u00C3' : // Ã
				case '\u00C5' : // Å
					output.append("A");
					break;
				case '\u00C4' : // Ä
				case '\u00C6' : // Æ
					output.append("AE");
					break;
				case '\u00C7' : // Ç
					output.append("C");
					break;
				case '\u00C8' : // È
				case '\u00C9' : // É
				case '\u00CA' : // Ê
				case '\u00CB' : // Ë
					output.append("E");
					break;
				case '\u00CC' : // Ì
				case '\u00CD' : // Í
				case '\u00CE' : // Î
				case '\u00CF' : // Ï
					output.append("I");
					break;
				case '\u00D0' : // Ð
					output.append("D");
					break;
				case '\u00D1' : // Ñ
					output.append("N");
					break;
				case '\u00D2' : // Ò
				case '\u00D3' : // Ó
				case '\u00D4' : // Ô
				case '\u00D5' : // Õ
				case '\u00D8' : // Ø
					output.append("O");
					break;
				case '\u00D6' : // Ö
				case '\u0152' : // Œ
					output.append("OE");
					break;
				case '\u00DE' : // Þ
					output.append("TH");
					break;
				case '\u00D9' : // Ù
				case '\u00DA' : // Ú
				case '\u00DB' : // Û
					output.append("U");
					break;
				case '\u00DC' : // Ü
					output.append("UE");
					break;
				case '\u00DD' : // Ý
				case '\u0178' : // Ÿ
					output.append("Y");
					break;
				case '\u00E0' : // à
				case '\u00E1' : // á
				case '\u00E2' : // â
				case '\u00E3' : // ã
				case '\u00E5' : // å
					output.append("a");
					break;
				case '\u00E4' : // ä
				case '\u00E6' : // æ
					output.append("ae");
					break;
				case '\u00E7' : // ç
					output.append("c");
					break;
				case '\u00E8' : // è
				case '\u00E9' : // é
				case '\u00EA' : // ê
				case '\u00EB' : // ë
					output.append("e");
					break;
				case '\u00EC' : // ì
				case '\u00ED' : // í
				case '\u00EE' : // î
				case '\u00EF' : // ï
					output.append("i");
					break;
				case '\u00F0' : // ð
					output.append("d");
					break;
				case '\u00F1' : // ñ
					output.append("n");
					break;
				case '\u00F2' : // ò
				case '\u00F3' : // ó
				case '\u00F4' : // ô
				case '\u00F5' : // õ
				case '\u00F8' : // ø
					output.append("o");
					break;
				case '\u00F6' : // ö
				case '\u0153' : // œ
					output.append("oe");
					break;
				case '\u00DF' : // ß
					output.append("ss");
					break;
				case '\u00FE' : // þ
					output.append("th");
					break;
				case '\u00F9' : // ù
				case '\u00FA' : // ú
				case '\u00FB' : // û
					output.append("u");
					break;
				case '\u00FC' : // ü
					output.append("ue");
					break;
				case '\u00FD' : // ý
				case '\u00FF' : // ÿ
					output.append("y");
					break;
				default :
					output.append(input.charAt(i));
					break;
			}
		}
		return output.toString();
	}
}

Regards

Uwe Goetzke
Leiter Produktentwicklung 
Healy Hudson GmbH 
Procurement & Retail Solutions   


-----Ursprüngliche Nachricht-----
Von: Sascha Fahl [mailto:sascha@evenity.net] 
Gesendet: Dienstag, 18. November 2008 13:07
An: java-user@lucene.apache.org
Betreff: Transforming german umlaute like ö,ä,ü,ß into oe, ae, ue, ss

Hi,
what is the best to transform the german umlaute ö,ä,ü,ß into oe, ae,  
ue, ss during the process of analyzing?

Thanks,


Sascha Fahl
Softwareentwicklung

evenity GmbH
Zu den Mühlen 19
D-35390 Gießen

Mail: sascha@evenity.net









---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076

Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger sind, dürfen Sie
die Informationen nicht offen legen oder benutzen. Wenn Sie diese Email durch einen Fehler
bekommen haben, teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den Absender
zurückschicken. Bitte löschen Sie danach diese Email.
This email is confidential. If you are not the intended recipient, you must not disclose or
use this information contained in it. If you have received this email in error please tell
us immediately by return email and delete the document.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message