httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <wr...@lnd.com>
Subject RE: BUFF, IOL, Chunking, and Unicode in 2.0 (long)
Date Sun, 07 May 2000 02:56:25 GMT
> From: jlwpc1 [mailto:jlwpc1@earthlink.net]
> Sent: Saturday, May 06, 2000 4:19 PM
> 
> From: William A. Rowe, Jr. <wrowe@lnd.com>
> 
> > Win32 offers functions to translate into or out of Unicode.
> 
> Yes Win32 does offer functions to do this but NT/W2K/Win64 OS 
> also does this every time it "runs" any Apache Server 
> "call/function/action".  NT/W2K/Win64 are Unicode "full time" systems.

I disagree entirely with your phrase 'Apache Server call...' - it only
translates Win32 API system calls (you are correct, the API is native
unicode and must translate constantly).  These are minimal under Apache,
the example I can think of include the registry (used when loading and
installing the service).  Since the winsock api is -NOT- unicode, it is
really irrelivant.  Same for the file system read/write.  Only -file-
-name- handling is thunked.

Kernel32 calls from ApacheCore.dll:

CreateEventA
CreateFileA
CreateProcessA
CreateSemaphoreA
FindFirstFileA
GetFullPathNameA
GetModuleFileNameA
GetVersionExA
LoadLibraryExA
OpenEventA
SetEnvironmentVariableA

AdvApi32 calls from ApacheCore.dll:

CreateServiceA
OpenSCManagerA
OpenServiceA
RegCreateKeyExA
RegisterServiceCtrlHandlerA
RegOpenKeyExA
RegQueryValueExA
RegSetValueExA
StartServiceA
StartServiceCtrlDispatcherA

WS2_32 calls:

WSADuplicateSocketA
WSASocketA

All these calls are transformed into unicode every time, as you point
out... but only these sorts of calls (other calls through the stdc lib
would also thunk through their indirection, obviously.  These are the 
direct calls only).

But 90 percent of these instructions are run at startup or very, very
infrequently.

There is no 'task' or 'mode' switching between ansi/unicode apps, if
they are 32 bit apps.  There is the transformation you mention on 
Win32 API calls, only.

The nightmare is running ansi window procecures, where data must be
completely thunked in and out of the win32 gdi internals.  We aren't
talking about single string params, but entire structures and the
'apparent' shared resources.  Yes, that's all unicode.  And a great
argument, at least for 2.0, in remaining a console app :-)

> So here you have a Unicode OS maybe spending more time 
> switching in and out of Unicode than serving Apache "actions".

What do I see above?  Directory tree parsing is for sh*t.  What
I'd like to know is if the FAT16/32 always do double transformations
(FindFirstFileA -> unicode -> file system -> ansi -> disk page)?

Environment parsing is clearly a problem, but principally for cgi's.

All in all, Apache doesn't stall terribly in ansi.

>  > This gives us 2 trips to convert, say, ISO-8599-1 to ISO-8599-7.
> 
> Trips?    
> 
> Like the above?
 
Point taken.  But an 8bit -> 8bit transformation is simply better.
There is a further problem, as I was dwelling on it...

The codepages don't correspond to win32 codepages.  Yes, unicode is
equivilant, as are some pages, but the multibyte stuff all changes
(including UTF-8 as well).  These aren't impossible transformations,
and unicode calls may make this easier, but I just don't like it.

> > This is not the best solution for performance or memory.
> 
> So whatever it is that Apache Server wants to do, the 
> programmer can do it faster/better than Win32 or the NT/W2K/Win64 OS?

I have no idea what you were saying here.
 
> > Win32 translates by codepage integers, not by string names.
> 
> And this is good/bad for what Apache Server wants to do?

If you read the example in the src/lib/apr/i18n/unix/xlate.c, you
will find that the source and dest charsets are string params.  And
several names may represent the same codepage.
 
> Okay now in simple words for me and/or "fake" Win32 code  - 
> what does Apache Server want to do?

One, make up the list of IANA charsets -> IANA number -> Win32 codepage.

I picture building the 256 char source charset list, translate
to unicode, and back out to the dest charset.  Then cache it, it
becomes the lookup table.  This only solves the 8bit->8bit table
issues, and doesn't even start to touch multibyte transformations.  
But it could be the basis for further effort.

A second issue, the codepage/language must be installed on the NT
box for this to work.

> Thanks,
> JLW

Never suggested it's pretty.  I found this out four hours into hacking it.

Mime
View raw message