httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Chuguev <...@urc.ac.ru>
Subject Re: Multilingual Apache [Was: Re: mod_mime/3238: New directive suggestion: AddCharset (fwd)]
Date Sun, 25 Oct 1998 14:50:01 GMT
Dirk-Willem van Gulik wrote:
> 
Hello Dirk-Willem. I recalled I already met your name somewhere in WWW
i18n
resources. Searching through various sources, I found
http://www.ceo.org/winter/sevilla/Overview.html

Of course, I read this earlier. Just have not remembered your name :-)
Very useful paper. I would like to refer to it in the MultiWeb
documentation.

And I hope this paper helped me to understand better your point of view
about Web multilinguism.

Excuse me if I misunderstood some of your notices below, but it seems
like
you did not read parts of MultiWeb documentation included into Apache
docs
(coming with distribution; online at
http://www.rnoc.urc.ac.ru/apache/manual/)
or, more correctly, like I did not write enough documentation yet :-)
Now I am doing exactly that (oh, writing papers is much more difficult
than programming :-), but will try to explain some things here in the
message.

> On Tue, 20 Oct 1998, Konstantin Chuguev wrote:
> 
> Actually, cause of the braindead way MIME handles charsets (i.e. as part
> of the content-type, rather than as an independendt dimension or variant),
> the way to do this in apache, since version 0.98 is to either add in your
> mime.types file or with AddType something along the lines of
> 
> html_latin1     text/html;charset=iso-...
> 
> In fact, using AddCharset would be counter productive (beleive me I
> tried!) unles you fix the entire content struct; i.e.break the implicit
Could you give us an example of counter productivity of AddCharset?
Anyway, this method is not the only one. I have it in MultiWeb, but
never
use it myself. There is another method, which suits my need better.
It works if file doesn't have a charset suffix. Looks like that:

<Language ru>
	ServerCharset koi8-r
</Language>

It's true that practically all resources in the same language share the
same charset on the server (or at least in some server's subdirectory:
<Language> directives can have any Apache context - up to <Files ...>).
There's no need to label the document with a charset suffix in that
case.
I don't. But someone might want to do that.

> link between charset, content-type (and q factor, etc) breaking just about
> every odule and causing subtle ssues with files like
> 
>         home.en.latin1.html
> 
> Which today just work fine.
> 
I have avoided changing request_rec content struct by storing the
charset
information in the r->notes table. http_protocol.c is patched a bit to
insert that information into the Content-Type response header line.
Another change in the http_protocol.c file is turning on the charset
converter
in case of textual content (I cannot be sure that content type is
textual
in a fixup_handler, where the converter is set up, because CGI scripts
can
set it later).
This is the dirty hack, but it seem to be unavoidable if I need the
functionality MultiWeb has.
I would like to have the standard mechanism of this in Apache.
Until it happened (I hope :-) I try to make the minimal changes of the
original sources.

> > > The implementation may well need cleaning up, but the idea sounds like it
> > > may possibly have value if it isn't too expensive.
> 
> > Just today the latest version is released: Apache-1.3.3-MultiWeb-3.2.
> >
> > Some details are on http://multiweb.urc.ac.ru/
> >
> > Unfortunately, not much documentation now, but I am working on it.
> >
> > Although my implementation is kind of expensive, I think it can
> > be useful for somebody...
> 
> It is actually a nice piece of work; though I worry about the i18n side,
> as it seems to have broken a server which does not have strictly
> paralellel text in it. And yes it is very expensive :-).
If I understood it right, you are afraid about unilingual servers or
ones
having resources with different content in different languages?
What do you mean "broken"?

I am ready to discuss the expensiveness and minimize it.
I really wonder how there is still no public available charset
conversion
API.
Apache has the great portability among almost all software now.
There are thoughts about making a separate library of Apache API
functions. Charset conversion functions would be very useful there :-)

> 
> I wonder if it could not be combined with mod_i18n which uses the CCC-API
> and be slot in _before_ mod_negotiation; and then use fake q= factors for
> all acept line (thus circumventing netscapes acept */*).
> DW
I am sorry, what do you mean here?

--
	Konstantin V. Chuguev.		System administrator of Southern
	http://www.urc.ac.ru/~joy/	Ural Regional Center of FREEnet,
	mailto:joy@urc.ac.ru		Chelyabinsk, Russia.

Mime
View raw message