apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Роман Донченко <DXDra...@yandex.ru>
Subject Re: Misbehaviour of apr_os_locale_encoding on Windows
Date Tue, 13 Apr 2010 14:29:04 GMT
William A. Rowe Jr. <wrowe@rowe-clan.net> писал в своём письме Tue,
13 Apr  
2010 19:18:57 +0500:

> And what is the encoding of that file?  Certainly no assurance that data
> is unicode, or one of the local code pages.  APR can't and doesn't try to
> deal with the representation of data passed around using APR.  In general
> windows environment is very good about handling utf-8 data, although it's
> irritating in the insistence on polluting streams with BOM's.

I agree that you can't reliably predict what encoding a file is in, but I  
assert the system ANSI code page (which apr_os_locale_encoding should IMO  
return) is a reasonable default. It's certainly not the user locale's code  
page (which it currently returns) — because nothing uses that. 8=]

> Something APR should address, is that -printing- that to a console  
> stream,
> a utf-8 stream can easily be handled with unicode.  That's a problem apr
> could reasonably solve for command line apps.

Perhaps, but printing to the console is not what's broken here.

>> or when I'm printing the username that I got from apr_uid_name_get.
>
> ... will always be utf-8, back to my point about external  
> representations.
> Internally, APR always pulls from the Win32 Unicode functions.
>

Um, that's just not true.

APR_DECLARE(apr_status_t) apr_uid_name_get(char **username, apr_uid_t  
userid,
                                            apr_pool_t *p)
{
     /* WinCE code snipped */
     SID_NAME_USE type;
     char name[MAX_PATH], domain[MAX_PATH];
     DWORD cbname = sizeof(name), cbdomain = sizeof(domain);
     if (!userid)
         return APR_EINVAL;
     if (!LookupAccountSid(NULL, userid, name, &cbname, domain, &cbdomain,  
&type))
         return apr_get_os_error();
     if (type != SidTypeUser && type != SidTypeAlias && type !=  
SidTypeWellKnownGroup)
         return APR_EINVAL;
     *username = apr_pstrdup(p, name);
     return APR_SUCCESS;
}

It's printing into a char buffer, ergo, it uses the ANSI variant of  
LookupAccountSid, and therefore the result is in the system ANSI code  
page. Same in the Unix version:

APR_DECLARE(apr_status_t) apr_uid_name_get(char **username, apr_uid_t  
userid,
                                            apr_pool_t *p)
{
     struct passwd *pw;
     struct passwd pwd;
     char pwbuf[PWBUF_SIZE];
     apr_status_t rv;

     rv = getpwuid_r(userid, &pwd, pwbuf, sizeof(pwbuf), &pw);
     if (rv) {
         return rv;
     }

     if (pw == NULL) {
         return APR_ENOENT;
     }

     /* thread-unsafe code snipped */
     *username = apr_pstrdup(p, pw->pw_name);
     return APR_SUCCESS;
}

getpwuid_t returns the raw byte representation of the username, which is  
in the locale encoding (well, Unix being byte-oriented, it can be an  
arbitrary binary string, but presumably the sysadmin uses the same  
encoding everywhere).

The difference is, on Unix, the result of apr_uid_name_get (and many other  
functions, I'm sure) is in the encoding detected by  
apr_os_locale_encoding, while on Windows this may not be the case.

Roman.


Mime
View raw message