apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William A Rowe Jr <wr...@rowe-clan.net>
Subject Re: apr_token_* conclusions (was: Better casecmpstr[n]?)
Date Thu, 26 Nov 2015 04:00:54 GMT
On Wed, Nov 25, 2015 at 9:44 PM, William A Rowe Jr <wrowe@rowe-clan.net>
wrote:

> LANG="ku_TR.iso88599";
>    64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
>       ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHİJKLMNOPQRSTUVWXYZ{|}~
>       v @abcdefghıjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
>       ?  ........*.................      ''''''''*'''''''''''''''''
>   192 = ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ
>       ^ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ÷ØÙÚÛÜIŞÿ
>       v àáâãäåæçèéêëìíîïğñòóôõö×øùúûüişßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ
>       ? ....................... .....*. ''''''''''''''''''''''' '''''*'
>

The translation here is pretty simple.  We display the ^ toupper() and the
v tolower() value of every character.  For the summary line '?', in normal
or -v verbose mode, ' ' suggests no translations at all, '.' means this ch
has a lower case translation, ' means the cc has an upper case translation,
but I strip most of these lines out while searching for the exceptional
cases...

'*' is the surprising case, the high bit character translation falls into
the ancient 0-127 code plane, or a ch 0-127 falls into the high bit plane,
or anything within the traditional 0-127 code plane translates into an
unexpected position.

LANG="mt_MT.iso88593";
  128 =                                  Ħ˘£¤ Ĥ§¨İŞĞĴ­ Ż°ħ²³´µĥ·¸ışğĵ½
ż
      ^                                  Ħ˘£¤ Ĥ§¨İŞĞĴ­ Ż°Ħ²³´µĤ·¸IŞĞĴ½
Ż
      v                                  ħ˘£¤ ĥ§¨işğĵ­ ż°ħ²³´µĥ·¸ışğĵ½
ż
      ?                                  .    .  *...  . '    '  *'''  '

The last example above seems to indicate an isprint() validation error
or utf-8 mis-assignment in iconv, somewhere in the last 16 characters
of this code table, apparently between Ĵ­ and Ż :)
Mime
View raw message