httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Reid" <dr...@jetnet.co.uk>
Subject Re: [PATCH] APR wrapper for iconv
Date Tue, 18 Apr 2000 08:55:29 GMT
Jeff,

It's my understanding (and I'm probably 100% wrong here) that most Unix's
and other OS's won't need this code?  If that's the case then can you add it
in a directory for the OS you're working on?  Then just add support for the
OS into the configure script and that should get teh code building for you.
I think last time we talked about it, we said we'd have no-ops for Unix and
any other platform that didn't require the functions.  these should probably
in the form of mscros (as someone suggested).

Does that make sense?

d.
----- Original Message -----
From: "Jeff Trawick" <trawickj@bellsouth.net>
To: <new-httpd@apache.org>
Sent: Tuesday, April 18, 2000 3:54 AM
Subject: [PATCH] APR wrapper for iconv


> This is definitely a work in progress, but fortunately what
> is presented here actually works.  I don't think these interfaces
> will be too firm until Apache switches to this code on at least
> two of the supported EBCDIC platforms.  My immediate goal with it
> is to get it in the library and ready to use (if not in an
> optimized form) so that everybody has something MBCS-capable as
> the EBCDIC support is hopefully cleaned up in 2.0.  Some of the
> big todos are caching of SBCS translation tables and, for MBCS,
> caching of open iconv descriptors.
>
> Currently, ap_translate_buffer() is approx. 11% slower on OS/390
> than 1.3's ebcdic2ascii().
>
> Changes from Ryan's sketch of apr_iconv.h, beyond function
> signatures:
>
> 1) ap_translate_codepage() is renamed to ap_translate_buffer() for
>    no good reason
> 2) no special preprocessor symbol is required to build this into
>    APR; if iconv() isn't available, nothing will be supported
>    since there is no fall-back mechanism; ap_codepage_open() will
>    fail at run-time;
>
> Should the file be apr/lib/apr_iconv.c or apr/misc/unix/apr_iconv.c?
> I know of a eUnix system/RTL with no iconv() and I think I know of a
> non-Unix system/RTL with iconv(), so I don't really consider this
> Unix-specific.  The possible future addition of non-iconv()
> translation support would help Unix (some) and non-Unix alike.
>
> Currently the set of routines is
>
>   ap_codepage_open()
>   ap_translate_buffer()
>   ap_translate_char()
>   ap_codepage_close()
>
> and the "handle" is ap_iconv_t.
>
> I prefer to change these names at some point (either to be more
> consistent with iconv() or just more consistent among themselves), but
> there is no need to do that immediately unless somebody feels like
> thinking about it now.
>
> APR_DEFAULT_CODEPAGE is somewhat experimental.  It is for when code
> has literal strings which must be translated.  We don't know what
> code page the strings are in when we write the code.  Presumably the
> builder took a tarball of ISO-8859-1 but then unpacked+translated them
> to some arbitrary code page supported by her C compiler.
> APR_DEFAULT_CODEPAGE is supposed to tell APR to to use that code page
> for the translation.  The IBM compiler for OS/390 has a way for code
> to determine the code page that it was compiled from.
>
> ? src/lib/apr/lib/apr_iconv.c
>
> /* ====================================================================
>  * The Apache Software License, Version 1.1
>  *
>  * Copyright (c) 2000 The Apache Software Foundation.  All rights
>  * reserved.
>  *
>  * Redistribution and use in source and binary forms, with or without
>  * modification, are permitted provided that the following conditions
>  * are met:
>  *
>  * 1. Redistributions of source code must retain the above copyright
>  *    notice, this list of conditions and the following disclaimer.
>  *
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in
>  *    the documentation and/or other materials provided with the
>  *    distribution.
>  *
>  * 3. The end-user documentation included with the redistribution,
>  *    if any, must include the following acknowledgment:
>  *       "This product includes software developed by the
>  *        Apache Software Foundation (http://www.apache.org/)."
>  *    Alternately, this acknowledgment may appear in the software itself,
>  *    if and wherever such third-party acknowledgments normally appear.
>  *
>  * 4. The names "Apache" and "Apache Software Foundation" must
>  *    not be used to endorse or promote products derived from this
>  *    software without prior written permission. For written
>  *    permission, please contact apache@apache.org.
>  *
>  * 5. Products derived from this software may not be called "Apache",
>  *    nor may "Apache" appear in their name, without prior written
>  *    permission of the Apache Software Foundation.
>  *
>  * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
>  * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
>  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
>  * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
>  * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>  * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
>  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
>  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
>  * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>  * SUCH DAMAGE.
>  * ====================================================================
>  *
>  * This software consists of voluntary contributions made by many
>  * individuals on behalf of the Apache Software Foundation.  For more
>  * information on the Apache Software Foundation, please see
>  * <http://www.apache.org/>.
>  */
>
> #include "apr_config.h"
>
> #include "apr_lib.h"
> #include "apr_iconv.h"
>
> #ifdef HAVE_ICONV_H
> #include <iconv.h>
> #endif
>
> #ifndef min
> #define min(x,y) ((x) <= (y) ? (x) : (y))
> #endif
>
> struct ap_iconv_t {
>     ap_pool_t *pool;
>     char *frompage;
>     char *topage;
>     char *sbcs_table;
> #ifdef HAVE_ICONV
>     iconv_t ich;
> #endif
> };
>
> /* get_default_codepage()
>  *
>  * simple hueristic to determine codepage of source code so that
>  * literal strings (e.g., "GET /\r\n") in source code can be translated
>  * properly
>  *
>  * If appropriate, a symbol can be set at configure time to determine
>  * this.  On EBCDIC platforms, it will be important how the code was
>  * unpacked.
>  */
>
> static const char *get_default_codepage(void)
> {
> #ifdef __MVS__
>     #ifdef __CODESET__
>         return __CODESET__;
>     #else
>         return "IBM-1047";
>     #endif
> #endif
>
>     if ('}' == 0xD0) {
>         return "IBM-1047";
>     }
>
>     if ('{' == 0xFB) {
>         return "EDF04";
>     }
>
>     if ('A' == 0xC1) {
>         return "EBCDIC"; /* not useful */
>     }
>
>     if ('A' == 0x41) {
>         return "ASCII"; /* not useful */
>     }
>
>     return "unknown";
> }
>
> static ap_status_t ap_iconv_cleanup(void *convset)
> {
> #ifdef HAVE_ICONV
>     ap_iconv_t *old = convset;
>
>     if (old->ich != (iconv_t)-1) {
>         if (iconv_close(old->ich)) {
>             return errno;
>         }
>     }
> #endif
>     return APR_SUCCESS;
> }
>
> #ifdef HAVE_ICONV
> static void check_sbcs(ap_iconv_t *convset)
> {
>     char inbuf[256], outbuf[256];
>     char *inbufptr = inbuf, *outbufptr = outbuf;
>     size_t inbytes_left, outbytes_left;
>     int i;
>     size_t translated;
>
>     for (i = 0; i < sizeof(inbuf); i++) {
>         inbuf[i] = i;
>     }
>
>     inbytes_left = outbytes_left = sizeof(inbuf);
>     translated = iconv(convset->ich, (const char **)&inbufptr,
>                        &inbytes_left, &outbufptr, &outbytes_left);
>     if (translated != (size_t) -1 &&
>         inbytes_left == 0 &&
>         outbytes_left == 0) {
>         /* hurray... this is simple translation; save the table,
>          * close the iconv descriptor
>          */
>
>         convset->sbcs_table = ap_palloc(convset->pool, sizeof(outbuf));
>         memcpy(convset->sbcs_table, outbuf, sizeof(outbuf));
>         iconv_close(convset->ich);
>         convset->ich = (iconv_t)-1;
>
>         /* TODO: add the table to the cache */
>     }
> }
> #endif
>
> ap_status_t ap_codepage_open(ap_iconv_t **convset, const char *topage,
>                              const char *frompage, ap_pool_t *pool)
> {
>     ap_status_t status;
>     ap_iconv_t *new;
>     int found = 0;
>
>     *convset = NULL;
>
>     if (!topage) {
>         topage = get_default_codepage();
>     }
>
>     if (!frompage) {
>         frompage = get_default_codepage();
>     }
>
>     new = (ap_iconv_t *)ap_palloc(pool, sizeof(ap_iconv_t));
>     if (!new) {
>         return APR_ENOMEM;
>     }
>
>     new->pool = pool;
>     new->topage = ap_pstrdup(pool, topage);
>     new->frompage = ap_pstrdup(pool, frompage);
>     if (!new->topage || !new->frompage) {
>         return APR_ENOMEM;
>     }
>
> #ifdef NYET
>     /* search cache of codepage pairs; we may be able to avoid the
>      * expensive iconv_open()
>      */
>
>     set found to non-zero if found in the cache
> #endif
>
> #ifdef HAVE_ICONV
>     if (!found) {
>         new->ich = iconv_open(topage, frompage);
>         if (new->ich == (iconv_t)-1) {
>             return errno;
>         }
>         found = 1;
>         check_sbcs(new);
>         /* TODO: if this is simple SBCS, add table to cache, call
>          * iconv_close(), note in ap_iconv_t that we'll be using our
>          * own table
>          */
>     }
> #endif
>
>     if (found) {
>         *convset = new;
>         ap_register_cleanup(pool, (void *)new, ap_iconv_cleanup,
>                             ap_null_cleanup);
>         status = APR_SUCCESS;
>     }
>     else {
>         status = EINVAL; /* same as what iconv() would return if we
>                             couldn't handle the pair */
>     }
>
>     return status;
> }
>
> ap_status_t ap_translate_buffer(ap_iconv_t *convset, const char *inbuf,
>                                 ap_size_t *inbytes_left, char *outbuf,
>                                 ap_size_t *outbytes_left)
> {
>     ap_status_t status = APR_SUCCESS;
> #ifdef HAVE_ICONV
>     size_t translated;
>
>     if (convset->ich != (iconv_t)-1) {
>         char *inbufptr = (char *)inbuf;
>         char *outbufptr = outbuf;
>
>         translated = iconv(convset->ich, (const char **)&inbufptr,
>                            inbytes_left, &outbufptr, outbytes_left);
>         if (translated == (size_t)-1) {
>             return errno;
>         }
>     }
>     else
> #endif
>     {
>         int to_convert = min(*inbytes_left, *outbytes_left);
>         int converted = to_convert;
>         char *table = convset->sbcs_table;
>
>         while (to_convert) {
>             *outbuf = table[(unsigned char)*inbuf];
>             ++outbuf;
>             ++inbuf;
>             --to_convert;
>         }
>         *inbytes_left -= converted;
>         *outbytes_left -= converted;
>     }
>
>     return status;
> }
>
> ap_status_t ap_codepage_close(ap_iconv_t *convset)
> {
>     ap_status_t status;
>
>     if ((status = ap_iconv_cleanup(convset)) == APR_SUCCESS) {
>         ap_kill_cleanup(convset->pool, convset, ap_iconv_cleanup);
>     }
>
>     return status;
> }
>
> Index: src/lib/apr/configure.in
> ===================================================================
> RCS file: /home/cvs/apache-2.0/src/lib/apr/configure.in,v
> retrieving revision 1.71
> diff -u -r1.71 configure.in
> --- src/lib/apr/configure.in 2000/04/15 19:05:12 1.71
> +++ src/lib/apr/configure.in 2000/04/18 02:35:39
> @@ -124,6 +124,7 @@
>  AC_CHECK_FUNC(inet_network, [ inet_network="1" ], [ inet_network="0" ])
>  AC_CHECK_FUNC(_getch)
>  AC_CHECK_FUNCS(gmtime_r localtime_r)
> +AC_CHECK_FUNCS(iconv)
>  AC_SUBST(sendfile)
>  AC_SUBST(fork)
>  AC_SUBST(inet_addr)
> @@ -176,6 +177,7 @@
>  AC_CHECK_HEADERS(arpa/inet.h)
>  AC_CHECK_HEADERS(netinet/in.h, netinet_inh="1", netinet_inh="0")
>  AC_CHECK_HEADERS(netinet/tcp.h)
> +AC_CHECK_HEADERS(iconv.h)
>
>  AC_CHECK_HEADERS(sys/file.h)
>  AC_CHECK_HEADERS(sys/ioctl.h)
> Index: src/lib/apr/include/apr_iconv.h
> ===================================================================
> RCS file: /home/cvs/apache-2.0/src/lib/apr/include/apr_iconv.h,v
> retrieving revision 1.5
> diff -u -r1.5 apr_iconv.h
> --- src/lib/apr/include/apr_iconv.h 2000/04/16 04:46:54 1.5
> +++ src/lib/apr/include/apr_iconv.h 2000/04/18 02:35:40
> @@ -62,16 +62,21 @@
>  #ifdef __cplusplus
>  extern "C" {
>  #endif /* __cplusplus */
> +
> +/* TODO: determine whether or not we always have these routines
> + * in APR and perhaps what to do if they aren't supported on
> + * some platforms (fail at compile time?  fail at link time?
> + * fail at run time?) */
> +
> +#if defined(ICONV_IMPLEMENT_NYET)
>
> -#if !defined(ICONV_IMPLEMENT)
> -
>  typedef void                         ap_iconv_t;
>
>  /* For platforms where we don't bother with translating between codepages
>   */
>
>  #define ap_codepage_open(convset, topage, frompage, pool)
> -#define ap_translate_codepage(convset, inbuf, inbytes_left, outbuf, \
> +#define ap_translate_buffer(convset, inbuf, inbytes_left, outbuf, \
>                                outbytes_left) outbuf=inbuf;
>  /* The purpose of ap_translate char is to translate one character
>   * at a time.  This needs to be written carefully so that it works
> @@ -81,20 +86,26 @@
>  #define ap_codepage_close(convset)
>
>  #else
> +
> +typedef struct ap_iconv_t ap_iconv_t;
>
> -typedef struct ap_iconv_t            ap_iconv_t;
> +ap_status_t ap_codepage_open(ap_iconv_t **convset, const char *topage,
> +                             const char *frompage, ap_pool_t *pool);
> +
> +ap_status_t ap_translate_buffer(ap_iconv_t *convset, const char *inbuf,
> +                                ap_size_t *inbytes_left, char *outbuf,
> +                                ap_size_t *outbytes_left);
>
> -void ap_codepage_open(ap_iconv_t **convset, const char *topage,
> -                         const char *frompage, ap_pool_t *pool);
> -void ap_translate_codepage(ap_iconv_t *convset, const char *inbuf,
> -                              ap_size_t inbytes_left, const char *outbuf,
> -                              ap_size_t outbytes_left);
> +#define APR_DEFAULT_CODEPAGE NULL
> +
>  /* The purpose of ap_translate char is to translate one character
>   * at a time.  This needs to be written carefully so that it works
>   * with double-byte character sets.
>   */
>  void ap_translate_char(ap_iconv_t *convset, char inchar, char outchar);
> -void ap_codepage_close(ap_iconv_t *convset)
> +
> +ap_status_t ap_codepage_close(ap_iconv_t *convset);
> +
>  #endif
>
>  #ifdef __cplusplus
> @@ -102,5 +113,3 @@
>  #endif
>
>  #endif  /* ! APR_ICONV_H */
> -
> -
> Index: src/lib/apr/lib/Makefile.in
> ===================================================================
> RCS file: /home/cvs/apache-2.0/src/lib/apr/lib/Makefile.in,v
> retrieving revision 1.11
> diff -u -r1.11 Makefile.in
> --- src/lib/apr/lib/Makefile.in 2000/04/06 22:23:50 1.11
> +++ src/lib/apr/lib/Makefile.in 2000/04/18 02:35:41
> @@ -24,7 +24,8 @@
>   apr_signal.o \
>   apr_snprintf.o \
>   apr_tables.o \
> - apr_getpass.o
> + apr_getpass.o \
> + apr_iconv.o
>
>  .c.o:
>   $(CC) $(CFLAGS) -c $(INCLUDES) $<
> @@ -95,3 +96,4 @@
>   $(INCDIR)/apr_pools.h $(INCDIR)/apr_lib.h $(INCDIR)/apr_file_io.h \
>   $(INCDIR)/apr_time.h $(INCDIR)/apr_thread_proc.h \
>   ../misc/unix/misc.h $(INCDIR)/apr_getopt.h
> +apr_iconv.o: apr_iconv.c $(INCDIR)/apr_iconv.h
> Index: src/lib/apr/test/ab_apr.c
> ===================================================================
> RCS file: /home/cvs/apache-2.0/src/lib/apr/test/ab_apr.c,v
> retrieving revision 1.24
> diff -u -r1.24 ab_apr.c
> --- src/lib/apr/test/ab_apr.c 2000/04/17 03:39:06 1.24
> +++ src/lib/apr/test/ab_apr.c 2000/04/18 02:35:44
> @@ -97,6 +97,14 @@
>
>  /*  --------------------------------------------------------------------
*/
>
> +#if 'A' != 0x41
> +/* Hmmm... This source code isn't being compiled in ASCII.
> + * In order for data that flows over the network to make
> + * sense, we need to translate to/from ASCII.
> + */
> +#define NOT_ASCII
> +#endif
> +
>  /* affects include files on Solaris */
>  #define BSD_COMP
>
> @@ -104,6 +112,9 @@
>  #include "apr_file_io.h"
>  #include "apr_time.h"
>  #include "apr_getopt.h"
> +#ifdef NOT_ASCII
> +#include "apr_iconv.h"
> +#endif
>  #include <string.h>
>  #include <stdio.h>
>  #include <stdlib.h>
> @@ -193,6 +204,9 @@
>  ap_pool_t *cntxt;
>
>  ap_pollfd_t *readbits;
> +#ifdef NOT_ASCII
> +ap_iconv_t *fromascii, *toascii;
> +#endif
>
>  /* --------------------------------------------------------- */
>
> @@ -538,11 +552,19 @@
>          int l = 4;
>          int space = CBUFFSIZE - c->cbx - 1; /* -1 to allow for 0
terminator */
>          int tocopy = (space < r) ? space : r;
> -#ifndef CHARSET_EBCDIC
> +#ifdef NOT_ASCII
> +        ap_size_t inbytes_left = space, outbytes_left = space;
> +
> +        status = ap_translate_buffer(fromascii, buffer, &inbytes_left,
> +                                     c->cbuff + c->cbx, &outbytes_left);
> +        if (status || inbytes_left || outbytes_left) {
> +            fprintf(stderr, "only simple translation is supported
(%d/%u/%u)\n",
> +                    status, inbytes_left, outbytes_left);
> +            exit(1);
> +        }
> +#else
>          memcpy(c->cbuff + c->cbx, buffer, space);
> -#else /*CHARSET_EBCDIC */
> -        ascii2ebcdic(c->cbuff + c->cbx, buffer, space);
> -#endif /*CHARSET_EBCDIC */
> +#endif /*NOT_ASCII */
>          c->cbx += tocopy;
>          space -= tocopy;
>          c->cbuff[c->cbx] = 0; /* terminate for benefit of strstr */
> @@ -671,6 +693,10 @@
>      ap_interval_time_t timeout;
>      ap_int16_t rv;
>      int i;
> +#ifdef NOT_ASCII
> +    ap_status_t status;
> +    ap_size_t inbytes_left, outbytes_left;
> +#endif
>
>      if (!use_html) {
>          printf("Benchmarking %s (be patient)...", hostname);
> @@ -719,9 +745,16 @@
>
>      reqlen = strlen(request);
>
> -#ifdef CHARSET_EBCDIC
> -    ebcdic2ascii(request, request, reqlen);
> -#endif /*CHARSET_EBCDIC */
> +#ifdef NOT_ASCII
> +    inbytes_left = outbytes_left = reqlen;
> +    status = ap_translate_buffer(toascii, request, &inbytes_left,
> +                                 request, &outbytes_left);
> +    if (status || inbytes_left || outbytes_left) {
> +        fprintf(stderr, "only simple translation is supported
(%d/%u/%u)\n",
> +                status, inbytes_left, outbytes_left);
> +        exit(1);
> +    }
> +#endif /*NOT_ASCII */
>
>      /* ok - lets start */
>      start = ap_now();
> @@ -886,6 +919,9 @@
>  int main(int argc, char **argv)
>  {
>      int c, r;
> +#ifdef NOT_ASCII
> +    ap_status_t status;
> +#endif
>
>      /* ap_table_t defaults  */
>      tablestring = "";
> @@ -896,6 +932,19 @@
>      atexit(ap_terminate);
>      ap_create_pool(&cntxt, NULL);
>
> +#ifdef NOT_ASCII
> +    status = ap_codepage_open(&toascii, "ISO8859-1",
APR_DEFAULT_CODEPAGE, cntxt);
> +    if (status) {
> +        fprintf(stderr, "ap_codepage_open(to ASCII)->%d\n", status);
> +        exit(1);
> +    }
> +    status = ap_codepage_open(&fromascii, APR_DEFAULT_CODEPAGE,
"ISO8859-1", cntxt);
> +    if (status) {
> +        fprintf(stderr, "ap_codepage_open(from ASCII)->%d\n", status);
> +        exit(1);
> +    }
> +#endif
> +
>      ap_optind = 1;
>      while (ap_getopt(argc, argv, "n:c:t:T:p:v:kVhwx:y:z:", &c, cntxt) ==
APR_SUCCESS) {
>          switch (c) {
>
> You must be really bored.
>
> --
> Jeff Trawick | trawick@ibm.net | PGP public key at web site:
>      http://www.geocities.com/SiliconValley/Park/9289/
>           Born in Roswell... married an alien...
>


Mime
View raw message