apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Kenneth Casson Leighton <l...@samba-tng.org>
Subject Re: apr unicode-16 lib.
Date Wed, 13 Jun 2001 12:17:03 GMT
On Tue, Jun 12, 2001 at 11:46:30AM -0500, William A. Rowe, Jr. wrote:
> From: "Luke Kenneth Casson Leighton" <lkcl@samba-tng.org>
> Sent: Tuesday, June 12, 2001 10:22 AM
> > for various reasons i am prompted to ask,
> > 
> > how would the idea of having an apr_ucs16 set of routines,
> > apr_wstrcat, apr_wstrcpy, apr_wtolower, apr_wtoupper etc.,
> > be received?
> Well, since apr_isfoo apr_tofoo was 'reinvented', I don't see a
> huge problem.

> > on nt, it's easy: straightforward usage of the NT 
> > wstrcat, wstrcpy etc. lines.
> These are the folks who never read the "Security Implications" of ucs-8 
> leaving 40% of all IIS webservers still vulnerable, so I'm dubious :-)

btw, samba #defines strcpy to ERROR_USE_SAFE_STRCPY_INSTEAD

sorry, forgot about this.  okay, rewrite that: how
about an equivalent apr_pwstrcat, apr_pwstrcpy with all
the safety / security / paranoia therein?

> Well, how about a simple question.  Why restrain ourselves to ucs2?

because it's what NT has: NT doesn't have 32-bit (ucs4?) unicode, afaik, 
only 16-bit (ucs2?)

writing your own ucs4 library, forget it, might as well adopt the
glib one.  but iirc, the glib one _only_ does ucs4, not ucs2.

> (No such thing as ucs16/32, it's ucs2/4).


> Can iconv/apr_iconv provide this in a charset-opaque manner?  That is, if
> I want three 'characters' in shift-jis, can it give me the right number
> of bytes?  The reason is simple, Unicode is already splintered into a
> multi-word character set anyways.  I suspect it's easier to just get it
> right, knowing the apr_xlate that's been opened, and asking for the char
> len v.s. the byte len (sizeof) and providing the strcpy/cmp, etc.

you need to be able to wtoupper, wtolower etc.  that requires
a lookup table.  samba has an optimised lookup table of the
standard ucs2 upper/lower conversion tables that is small enough
to fit into the 2nd-level cache of an intel processor.


View raw message