hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Moore <j...@apache.org>
Subject Re: Obtaining charset of page from HttpResponse.
Date Tue, 16 Aug 2011 11:52:55 GMT
Hi,

This is because the resource at www.annahar.com that you link to
returns a Content-Type header that just reads "text/html":

$ curl -v "http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon"
>/dev/null
* About to connect() to www.annahar.com port 80 (#0)
*   Trying 66.242.155.235... connected
* Connected to www.annahar.com (66.242.155.235) port 80 (#0)
> GET /content.php?priority=1&table=main&type=main&day=Mon HTTP/1.1
> User-Agent: curl/7.16.4 (i386-apple-darwin9.0) libcurl/7.16.4 OpenSSL/0.9.7l zlib/1.2.3
> Host: www.annahar.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Connection: close
< Date: Tue, 16 Aug 2011 11:50:50 GMT
< Server: Microsoft-IIS/6.0
< X-Powered-By: ASP.NET
< X-Powered-By: PHP/5.2.0
< Content-type: text/html
<
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:--
--:--:--     0{ [data not shown]
100 91340    0 91340    0     0   187k      0 --:--:-- --:--:--
--:--:--  237k* Closing connection #0

So httpclient is doing the right thing -- it's giving you access to
exactly what's in the header that's returned.

Jon


On Tue, Aug 16, 2011 at 7:42 AM, Khosro Asgharifard Sharabiani
<khosro_question@yahoo.com> wrote:
> Hello,
> I use the following code to find charset of a page,but it does not worked for page "http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon"
>
> Code :
>  [code]
>
> try {
> HttpClient httpclient = new DefaultHttpClient();
> String url="http://www.annahar.com/content.php?priority=1&table=main&type=main&day=Mon";
> HttpGet httpget = new HttpGet(url);
> HttpResponse response;
> response = httpclient.execute(httpget);
> HttpEntity entity = response.getEntity();
> if (entity != null) {
> Header[] allHeaders = response.getHeaders("Content-Type");
> System.out.println(allHeaders[0].getValue());
> }
> } catch (ClientProtocolException e) {
> e.printStackTrace();
> } catch (IOException e) {
> e.printStackTrace();
> }
> [/code]
>
>
> And the output of above code is : text/html.
> But i think the output must be "text/html; charset=windows-1256" .Am i right?
>
> But when i use "http://bigbrowser.blog.lemonde.fr/2011/08/03/iran-le-mossad-derriere-le-meurtre-dun-scientifique-spiegel"
as a url in code,it returns "text/html; charset=UTF-8" ,that i think ,it is OK.
> It seems ,it works for some pages not all of them.Why this happens?
>
>
> Khosro.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message