tcl-websh-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tagu...@iij.ad.jp
Subject Re: i18n problems in Websh (multibyte charsets)
Date Tue, 30 Aug 2005 05:23:40 GMT
Hi,

Tcl has encoding mechanism.
Tcl assumes all input strings from input channel are written using
its encoding. default encoding can be refferd by "encoding system"
command.
And Tcl convert this string from its encoding to internal UTF-8
string.
For output, Tcl try to convert from internal UTF-8 string to its
encoding.

With the exception of such converting, if channel's encoding is
"binary", then no conversion occur.

Many "multibyte probrems" are occur at this point.

Imagen some system which has multibyte encoding such as euc-jp.
And websh test/ scripts are contain raw utf-8? multibyte strings.

If websh which has multibyte encoding try to read such test scripts,
it will try to convert the scripts from its system encoding to internal
UTF-8 encoding. But input is already UTF-8 string. So it will be broken.
So all raw 8bit string must be written "\uXXX" notation.
And It must be correct string for system encoding.
For example, Tcl can read any Chinese string which written using "\uXXX"
notatin. But If its Tcl has "euc-jp" system encoding, Tcl can not output
it. Encoding for output channel must be Chinese encoding for Chinese string.
So I think test scripts must be evaluated under correct encoding.

And the otherhand, Some one will think encoding binary is good solution.
But It is not good idea.

Tcl can input a string from binary encoding channel. and can output such
string. But Tcl can not operate such string.
For example,
% fconfigure stdin -encoding binary
% set rawStr [gets stdin] 
% set splitStr [split $rawStr {}]; # splitStr will be broken.

I think websh try to deal multibyte string as single byte string.
And additionaly, I think websh also has above encoding related probrems.

> If you talk about scripts that are sourced from mod_websh, you have to
> look at src/generic/webinterp.c: in readWebInterpCode() we basically
src/generic/interpool.c ?
> do the following:
> 
> Tcl_Obj *objPtr = Tcl_NewObj();
> chan = Tcl_OpenFileChannel(interp, filename, "r", 0644);
> Tcl_ReadChars(chan, objPtr, -1, 0);
> Tcl_Close(interp, chan);
> -> objPtr is the code object that is later eval'ed using Tcl_EvalObjEx
> 
> Hope that helps

Thanks! Ronnie. I want to find this one, But I could not...

Notice. Encoding for this channel "chan" is default system encoding.
Websh can read ws3 script which written using its system encoding.

But I think channel for formdata has "binary" encoding.
So websh can not deal multibyte form data.
Ofcause,
  web::put [encoding convertfrom [encoding system] [web::formvar varName]]
work fine.

Thanks.
Taguchi,T.

---------------------------------------------------------------------
To unsubscribe, e-mail: websh-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: websh-dev-help@tcl.apache.org


Mime
View raw message