lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: (Offtopic) The unicode name for a character
Date Wed, 22 Dec 2004 11:16:11 GMT
Hi Peter,
> 
> The Question:
> In Java generally, Is there an easy way to get the unicode name of a 
> character?  (e.g. "LATIN SMALL LETTER A" from 'a')
> 
...
> 
> I'm considering taking the unicode name for each character I encounter 
> and regexping it against something like:
> ^LATIN .* LETTER (.) WITH .*$
> ... to try and extract the single A-Z|a-z character.
> 
There used to be a list (ASCII) on some ftp server at unicode.org.
I have a version 'UnicodeData.txt' here.
It lists ~ 12000 characters in the form
01A4;LATIN CAPITAL LETTER P WITH HOOK;Lu;0;L;;;;;N;LATIN CAPITAL LETTER P HOOK;;;01A5;
01A5;LATIN SMALL LETTER P WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER P HOOK;;01A4;;01A4

If you cannot find that list somewhere I can mail you a copy.

It would be a nice contribution if you could add your filter to lucenes
sandbox, once it's finished.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message