jakarta-regexp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holger Stratmann <Hol...@cheerful.com>
Subject Re: detecting UTF-8 characters
Date Wed, 16 Jan 2002 20:49:08 GMT
Hi Sanjay,

even though I'm not 100% sure what you're trying to do and why you're using
regular expressions for it, I think I can roughly guess and give you some hints:

\xNN is used to refer to the character with hex-code NN (e.g. \x20 = ASCII 32 =
space)
Therefore, you could use [\x20-\x7F] for ASCII 32-128 or [\x20-\xFF] for ANSI
32-255
Likewise - obviously, [^\x20-\xFF] would be any character above 255 (or below
32)

I hope that will help you solve your problem - some additional hints about your
question below:

> I am trying to write a validation function that would allow me to detect any
> UTF-8 characters

just a clarification:
UTF-8 is just a "transfer" format: a special way of not wasting too much space
when transferring/saving Unicode-characters. There is no such thing as a
"UTF-8-character"
If there was, it would be a superset of ASCII and every ASCII character would
also be a UTF-8 character

> Just to give the context -- its a user driven program and we would like to
> detect when the user has entered any UTF-8 character Vs. only ASCII
> characters.

By the way: If that's all you're trying to do, it will probably be MUCH more
efficient if you just check each character
(like: for (int i = 0; i < s.length(); i++) {if (s.charAt(i) > 255) return i;}
return -1;)
RegExp does nothing less - actually, it will do much more and cause a lot of
overhead

> Is there a way by way of expressions that I can detect if the user is
> entering UTF-8 characters ... \u or \x something of that sorts...

\uNNNN checks for Unicode characters
Btw: If you live in Europe, the EURO-symbol (€) is an excellent thing for
testing :-)
Easy to enter (if you live in Europe *g*), but a "strange" Unicode character in
Java (\u20AC)

> any help would be greatly appreciated..

HTH,
     Holger




--
To unsubscribe, e-mail:   <mailto:regexp-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:regexp-dev-help@jakarta.apache.org>


Mime
View raw message