httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject "locale" project
Date Fri, 19 Dec 1997 08:42:02 GMT
Here's something that needs to be done if someone is interested.  Apache
abuses the heck out of the locale specific functions.  Its usage is not
compliant with POSIX/ANSI/whateveryouwant.  It doesn't behave well when a
module wants to use a different locale, or when the admin wants to set a
different locale.  Specific examples: 

- (PR#1305, PR#1450) We use isalpha(), isalnum(), etc. functions with type
"char" arguments.  Strictly speaking we can only do that if we test
isascii() first.  But an alternative which is preferable, because it's
less expensive (performance), is to use unsigned chars (almost) 
everywhere. 

The ANSI C standard allows an implementation to choose whether an
unqualified char is signed or unsigned.  Most compilers default to signed,
but have an option for unsigned (or vice versa).  "gcc -funsigned-char" 
for example makes "char" an unsigned char.  My suggestion is that we
"typedef unsigned char uchar;" and "typedef signed char schar;" and use
those everywhere in place of char.  It's quite possible that we'll never
need schar (the only cases we'd need schar are those where we're using
chars as a 1-byte signed integer, which in apache should be non-existant). 

I suspect there will be compiler warning issues because C library
functions use naked "char".  The workaround will likely to be to use "gcc
-funsigned-char -Wall", or whatever the equiv is on whatever other ANSI
compiler we use.  Note that this is just a warning issue -- not
necessarily a correctness issue.  (Although the library could be whacked.) 

- (PR#76, #679) We use locale-specific functions (isalpha(), isalnum(),
strftime(), and on and on) without setting the locale.  There is a
conflict here between Apache wanting to use some functions in a specific
way (i.e. assuming the "C" locale) and some modules wanting to let the
user do things in their locale of choice (i.e. mod_php).

A big worry here is that setlocale() is an expensive function.  On
Solaris, for example, it involves reading a file off disk.  So we can't
just switch locales at will.  It's unfortunate, but it's not possible for
a POSIX program to exist in two locales at once.

- (PR#754) struct tm does not include a time zone on all systems, we
assume it does.  It's entirely possible that we'll need our own strftime() 
replacement.

- I'm certain we have assumptions that isalpha(c) is the same as c == 'a'
|| c == 'b' || ... || c == 'z' (and the upper case letters).  When in
truth, isalpha(c) can also be true for various 8-bit characters in most
non-"C" locales.  For example, in "ISO-8859-1", isalpha(192) is TRUE. 
This has to be fixed. 

I'd say that if someone does take this on, they'll have to possibly
maintain this as a patch through 1.x versions into 2.x versions.  The
changes are likely too large to make 1.3.  I would like them to be part of
1.x, but until we see how far reaching they are it's hard to say if we can
fit it into the "schedule".  It's unlikely they can make 1.3.0. 

A good approach would be to research the problem first... and report back
how you plan to deal with it all.  We'll all argue it (we always do, it's
better to argue before you write code and possibly waste your time).  Then
write the code. 

Dean



Mime
View raw message