Return-Path: Delivered-To: apmail-apache-docs-archive@apache.org Received: (qmail 42199 invoked by uid 500); 8 May 2001 11:39:19 -0000 Mailing-List: contact apache-docs-help@apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: apache-docs@apache.org Delivered-To: mailing list apache-docs@apache.org Received: (qmail 41769 invoked by uid 500); 8 May 2001 11:39:06 -0000 Delivered-To: apmail-httpd-docs-1.3-cvs@apache.org Date: 8 May 2001 11:38:59 -0000 Message-ID: <20010508113859.41490.qmail@apache.org> From: martin@apache.org To: httpd-docs-1.3-cvs@apache.org Subject: cvs commit: httpd-docs-1.3/htdocs/manual/mod core.html martin 01/05/08 04:38:58 Modified: htdocs/manual ebcdic.html htdocs/manual/mod core.html Log: Move EBCDIC conversion blurb to where it fits better. Suggested by: Joshua Slive Revision Changes Path 1.11 +149 -37 httpd-docs-1.3/htdocs/manual/ebcdic.html Index: ebcdic.html =================================================================== RCS file: /home/cvs/httpd-docs-1.3/htdocs/manual/ebcdic.html,v retrieving revision 1.10 retrieving revision 1.11 diff -u -u -r1.10 -r1.11 --- ebcdic.html 2001/03/09 10:09:47 1.10 +++ ebcdic.html 2001/05/08 11:38:35 1.11 @@ -16,7 +16,7 @@

Overview of the Apache EBCDIC Port

- Version 1.3 of the Apache HTTP Server is the first version which + As of Version 1.3, the Apache HTTP Server includes a port to (non-ASCII) mainframe machines which use the EBCDIC character set as their native codeset.
(Initially, that support covered only the Fujitsu-Siemens family of @@ -27,42 +27,148 @@ systems TPF and OS/390 were added).

-

- The port was started initially to -

+
-
    -
  • prove the feasibility of porting - the Apache HTTP server - to this platform -
  • find a "worthy and capable" successor for the venerable - CERN-3.0 daemon - (which was ported a couple of years ago), and to -
  • prove that Apache's preforking process model can on this platform - easily outperform the accept-fork-serve model used by CERN by a - factor of 5 or more. -
+

EBCDIC-related conversion functions

-

- This document serves as a rationale to describe some of the design - decisions of the port to this machine. -

+ The EBCDIC related directives + EBCDICConvert, + EBCDICConvertByType, and + EBCDICKludge + are available + only if the platform's character set is EBCDIC + (This is currently only the case on Fujitsu-Siemens' + BS2000/OSD and IBM's OS/390 and TPF operating systems). EBCDIC + stands for Extended Binary-Coded-Decimal Interchange Code + and is the codeset used on mainframe machines, in contrast to + ASCII which is ubiquitous on almost all micro computers today. + ASCII (or its extension latin1) is the basis for the HTTP + transfer protocol, therefore all EBCDIC-based platforms need a + way to configure the code set conversion rules required between + the EBCDIC based mainframe host and the HTTP socket protocol.
+ +

+ On an EBCDIC based system, HTML files and other text files are + usually saved encoded in the native EBCDIC code set, while image + files and other binary data are stored with identical encoding as + on ASCII based machines. When the Apache server accesses documents, + it must therefore make a distinction between text files (to be + converted to/from ASCII, depending on the transfer direction) + and binary files (to be delivered unconverted). + Such a distinction can be made based on the assigned MIME type, or + based on the file extension (i.e., files sharing a common file + suffix). +

+ +

+ By default, the configuration is symmetric for input and output + (i.e., when a PUT request is executed for a document which was + returned by a previous GET request, then the resulting uploaded + copy should be identical to the original file). However, the + conversion directives allow for specifying different conversions + for input and output. +

+ +

+ The directives EBCDICConvert and + EBCDICConvertByType are used to + assign the conversion setting (On or Off) based on file + extensions or MIME types. Each configuration setting can be defined + for input only (e.g., PUT method), output only (e.g., GET method), + or both input and output. By default, the conversion setting is + applied for input and output. +

+ +

+ Note that after modifying the conversion settings for a group of + files, it is not sufficient to restart the server. The reason for + this is the fact that a cached copy of a document (in a browser or + proxy cache) will not get revalidated by contents, but only by + date. Since the modification time of the document did not change, + browsers will assume they can reuse the cached copy.
+ To recover from this situation, you must either clear all cached + copies (browser and proxy cache!), or update the modification time + of the documents (using the touch command on the server). +

+ +

+ Note also that server-parsed documents (CGI scripts, .shtml files, + and other interpreted files like PHP scripts etc.) are not subject to + any input conversion and must therefore be stored in EBCDIC form + on the server side. +

+ +

+ In absense of any + EBCDICConvertByType directive, + and if no matching EBCDICConvert was + found, Apache falls back to an internal heuristic which assumes + that all documents with MIME types starting with + "text/", "message/" or + "multipart/" as well as the MIME type + "application/x-www-form-urlencoded" are text documents + stored in EBCDIC, whereas all other documents are binary files. +

+ +

+ In order to provide backward compatibility with older versions of + apache, the EBCDICKludge directive + allows for a less powerful mechanism to control the conversion of + documents to and from EBCDIC. +

+ +

+ Note:

+ The EBCDICKludge directive is deprecated, since its functionality + is superseded by the more powerful + EBCDICConvert and + EBCDICConvertByType + directives.
+

+ +

+ The directives are applied in the following order: +

    +
  1. First, the configured EBCDICConvert + directives in the current context are evaluated in + configuration file order. As soon as a matching file extension + is found, the search stops and the configured conversion is + applied.
    + + EBCDICConvert settings inherited from parent directories are + tested after the more specific (deeper) directory levels. +
  2. +
  3. If the EBCDICKludge is in effect, + the next step tests for a MIME type of the format + type/x-ascii-subtype. If the + document has such a type, then the + "x-ascii-" substring is removed and the + conversion set to Off. +
  4. +
  5. In the next step, the configured + EBCDICConvertByType + directives are evaluated in configuration file order. If + the document has a matching MIME type, the search stops and + the configured conversion is applied.
    + + EBCDICConvertByType settings inherited from parent + directories are tested after the more specific (deeper) + directory levels.
    + + If no EBCDICConvertByType + directive at all exists in the current context, the server + falls back to the simple heuristics which assume that MIME + types starting with "text/", "message/" or "multipart/" (plus + the special type "application/x-www-form-urlencoded" used in + simple POST requests) imply a conversion, while all the rest + is delivered unconverted (i.e., binary). +
  6. +
+

-

Design Goals

-

- One objective of the EBCDIC port was to maintain enough backwards - compatibility with the (EBCDIC) CERN server to make the transition to - the new server attractive and easy. This required the addition of - a configurable method to define whether a HTML document was stored - in ASCII (the only format accepted by the old server) or in EBCDIC - (the native document format in the POSIX subsystem, and therefore - the only sensible format in which the other POSIX tools like grep - or sed could operate on documents). Later, special EBCDIC conversion - directives were added which allow for the flexible definition of - conversion rules based on the documents' MIME type or file extension. -

+
-

Technical Solution

+

Technical Details

Since all Apache input and output is based upon the BUFF data type and its methods, the easiest solution was to add the actual @@ -111,7 +217,8 @@ requests. (See RFC2616 and src/main/http_protocol.c for details.) -

Porting Notes

+
+

Porting Notes

  1. @@ -184,8 +291,9 @@ are text documents and are stored as EBCDIC files, whereas all other files are binary files (and stored in a byte-identical encoding as on an ASCII machine).
    - These defaults can be overridden - on a by-MIME-type and/or by-file-extension basis, using the + These defaults can be overridden + on a by-MIME-type and/or + by-file-extension basis, using the directives
            EBCDICConvertByType {On|Off}[={In|Out|InOut}] mimetype [...]
            EBCDICConvert       {On|Off}[={In|Out|InOut}] fileext [...]
      @@ -219,8 +327,10 @@
          
+ +
-

Document Storage Notes

+

Document Storage Notes

Binary Files

When exchanging binary files between the mainframe host and a @@ -242,6 +352,8 @@

SSI documents must currently be stored in EBCDIC only. No provision is made to convert them from ASCII before processing. + The same holds for other interpreted languages, like + mod_perl or mod_php.

1.188 +7 -128 httpd-docs-1.3/htdocs/manual/mod/core.html Index: core.html =================================================================== RCS file: /home/cvs/httpd-docs-1.3/htdocs/manual/mod/core.html,v retrieving revision 1.187 retrieving revision 1.188 diff -u -u -r1.187 -r1.188 --- core.html 2001/05/04 22:38:24 1.187 +++ core.html 2001/05/08 11:38:48 1.188 @@ -796,134 +796,6 @@


-

EBCDIC-related conversion functions

- - The following EBCDIC related directives are available - only if the platform's character set is EBCDIC - (This is currently only the case on Fujitsu-Siemens' - BS2000/OSD and IBM's OS/390 and TPF operating systems). EBCDIC - stands for Extended Binary-Coded-Decimal Interchange Code - and is the codeset used on mainframe machines, in contrast to - ASCII which is ubiquitous on almost all micro computers today. - ASCII (or its extension latin1) is the basis for the HTTP - transfer protocol, therefore all EBCDIC-based platforms need a - way to configure the code set conversion rules required between - the EBCDIC based mainframe host and the HTTP socket protocol.
- -

- On an EBCDIC based system, HTML files and other text files are - usually saved encoded in the native EBCDIC code set, while image - files and other binary data are stored with identical encoding as - on ASCII based machines. When the Apache server accesses documents, - it must therefore make a distinction between text files (to be - converted to/from ASCII, depending on the transfer direction) - and binary files (to be delivered unconverted). - Such a distinction can be made based on the assigned MIME type, or - based on the file extension (i.e., files sharing a common file - suffix). -

- -

- By default, the configuration is symmetric for input and output - (i.e., when a PUT request is executed for a document which was - returned by a previous GET request, then the resulting uploaded - copy should be identical to the original file). However, the - conversion directives allow for specifying different conversions - for input and output. -

- -

- The directives EBCDICConvert and - EBCDICConvertByType are used to - assign the conversion setting (On or Off) based on file - extensions or MIME types. Each configuration setting can be defined - for input only (e.g., PUT method), output only (e.g., GET method), - or both input and output. By default, the conversion setting is - applied for input and output. -

- -

- Note that after modifying the conversion settings for a group of - files, it is not sufficient to restart the server. The reason for - this is the fact that a cached copy of a document (in a browser or - proxy cache) will not get revalidated by contents, but only by - date. Since the modification time of the document did not change, - browsers will assume they can reuse the cached copy.
- To recover from this situation, you must either clear all cached - copies (browser and proxy cache!), or update the modification time - of the documents (using the touch command on the server). -

- -

- In absense of any - EBCDICConvertByType directive, - and if no matching EBCDICConvert was - found, Apache falls back to an internal heuristic which assumes - that all documents with MIME types starting with - "text/", "message/" or - "multipart/" as well as the MIME type - "application/x-www-form-urlencoded" are text documents - stored in EBCDIC, whereas all other documents are binary files. -

- -

- In order to provide backward compatibility with older versions of - apache, the EBCDICKludge directive - allows for a less powerful mechanism to control the conversion of - documents to and from EBCDIC. -

- -

- Note:

- The EBCDICKludge directive is deprecated, since its functionality - is superseded by the more powerful - EBCDICConvert and - EBCDICConvertByType - directives.
-

- -

- The directives are applied in the following order: -

    -
  1. First, the configured EBCDICConvert - directives in the current context are evaluated in - configuration file order. As soon as a matching file extension - is found, the search stops and the configured conversion is - applied.
    - - EBCDICConvert settings inherited from parent directories are - tested after the more specific (deeper) directory levels. -
  2. -
  3. If the EBCDICKludge is in effect, - the next step tests for a MIME type of the format - type/x-ascii-subtype. If the - document has such a type, then the - "x-ascii-" substring is removed and the - conversion set to Off. -
  4. -
  5. In the next step, the configured - EBCDICConvertByType - directives are evaluated in configuration file order. If - the document has a matching MIME type, the search stops and - the configured conversion is applied.
    - - EBCDICConvertByType settings inherited from parent - directories are tested after the more specific (deeper) - directory levels.
    - - If no EBCDICConvertByType - directive at all exists in the current context, the server - falls back to the simple heuristics which assume that MIME - types starting with "text/", "message/" or "multipart/" (plus - the special type "application/x-www-form-urlencoded" used in - simple POST requests) imply a conversion, while all the rest - is delivered unconverted (i.e., binary). -
  6. -
-

- -
-

EBCDICConvert

See also: EBCDICConvertByType + and Overview of the EBCDIC Conversion Functions


@@ -1062,6 +935,7 @@

See also: EBCDICConvert + and Overview of the EBCDIC Conversion Functions


@@ -1130,6 +1004,11 @@ conversion. (Before Apache version 1.3.19, there was no way at all to force these binary documents to be treated as EBCDIC text files.) +

+

+ See also: EBCDICConvert, + EBCDICConvertByType + and Overview of the EBCDIC Conversion Functions


--------------------------------------------------------------------- To unsubscribe, e-mail: apache-docs-unsubscribe@apache.org For additional commands, e-mail: apache-docs-help@apache.org