hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ingo Meyer" <dj-sl...@gmx.net>
Subject charset-handling << AW: AW: german umlauts öüä
Date Wed, 25 Jan 2006 15:32:48 GMT
> -----Ursprüngliche Nachricht-----
> Von: Roland Weber [mailto:http-async@dubioso.net] 
> Gesendet: Mittwoch, 25. Januar 2006 16:25
> An: HttpClient User Discussion
> Betreff: Re: AW: german umlauts öüä
> 
> Hi Ingo,
> 
> > To anybody: Does this means that in html standardly the 
> "iso-8859-1" 
> > is taken?
> 
> No, it doesn't. Guessing the default character set is up to 
> the user agent (browser). But if you only want to access a 
> single web site, and you are reasonably sure that they won't 
> change the character encoding, you can still work with a default.
> 
Thanks Roland,

my program will access many different pages.

Anyway, i will resumee what i have learned to handle the charset:

1. Have a look into the header for the entry "Content-Type" and if one take
this.
2. When site has text content and no charset found in header take a default
(maybe "iso-8859-1")
3. If content is "text/html" and no charset so far search for "<meta
http-equiv="content-type"" tag
   and have a look if a charset is given there

cheers,
Ingo

> > then i will always call: new String (bytes, "iso-8859-1");
> 
> cheers,
>   Roland
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: 
> httpclient-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Mime
View raw message