lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: (Offtopic) The unicode name for a character
Date Thu, 23 Dec 2004 09:39:02 GMT
: However, I don't think that the names are consistent enough to permit a
: generic use of regular expressions. What Daniel is trying to achieve
: looks interesting anyway,

I'm not sure that that really matters in the long run ... I think the OP
was asking if there was a way to get the name in java because he figured
that way he could programaticly determine what the "base" character was in
his application.  But, that doesn't mean he needs to do this
progromatically every time his indexing/searching code sees a character
outside of LATIN-1

it would probably make more sense to write a little one off program that
could read in this file, and then spit out all of the non latin-1
characters with a guess as to which latin-1 character could act as a
substitution (if any) based on the name of the chracter, and a blank for
the user to override.  This program could be run once to generate a nice
small, efficient mapping table that could be (commited to cvs and) reused
over and over.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message