hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Weber <ROLWE...@de.ibm.com>
Subject Re: Japanese charset?
Date Thu, 16 Jun 2005 06:02:40 GMT
Hello Andrew,

sorry that my mail yestarday took 9 hours to get to the list.
I hope this one appears in a timely manner :-)


"Andrew A. Sabitov" <sabitov@catalysis.nsk.su> wrote on 16.06.2005 
03:04:05:

> Server sends Shift_JIS as page charset. 
> 
> it's my code now:
> 
> ............
> result = new HttpResponse ( method.getResponseBodyAsStream (), 
> method.getResponseCharSet() );
> .........
> 
> //in HttpResponse constructor:
> HttpResponse ( InputStream responseBodyAsStream, String charset ) 
> throws IOException {
>    BufferedReader reader = new BufferedReader ( new 
> InputStreamReader ( responseBodyAsStream, charset ) );
>         String line = null;
>         while ( ( line = reader.readLine() ) != null ) {
>             this.add( line );
>          out.write( line );
>          out.write( "\n" );
>         }
> 
> }
> 
> It works. :)
> 
> It's funny, but http://jakarta.apache.org/commons/httpclient/3.
> 0/charencodings.html
> says: "If the response is known to be a String, you can use the 
> getResponseBodyAsString method which will automatically use the encoding 

> specified in the Content-Type header or ISO-8859-1 if no charset is 
> specified."
> 
> Content-Type for this page is "text/html; charset=Shift_JIS", I realy 
> thought that httpclient autocovert body... :( 
> 

I've checked the code for 3.0. Here are the relevant fragments:

http://svn.apache.org/repos/asf/jakarta/commons/proper/httpclient/trunk/src/java/org/apache/commons/httpclient/HttpMethodBase.java
method getResponseBodyAsString:
        byte[] rawdata...
            ... = getResponseBody()
        ...
            return EncodingUtil.getString(rawdata, getResponseCharSet())
        ...


http://svn.apache.org/repos/asf/jakarta/commons/proper/httpclient/trunk/src/java/org/apache/commons/httpclient/util/EncodingUtil.java
method getString(byte[],int,int,String):

       ...  return new String(data, offset, length, charset)
       ...  LOG.warn("Unsupported encoding: " + charset + ". System 
encoding used");
            return new String(data, offset, length);

I wonder whether the InputStreamReader recognizes charsets that the String
constructor doesn't? But why should it? And why wouldn't you get the 
warning?
Something is fishy here.

cheers,
  Roland


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message