hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tarik Yilmaz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HTTPCLIENT-1590) Chatset detection problem if Content-Type header is text/html
Date Tue, 23 Dec 2014 15:13:13 GMT
Tarik Yilmaz created HTTPCLIENT-1590:

             Summary: Chatset detection problem if Content-Type header is text/html
                 Key: HTTPCLIENT-1590
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1590
             Project: HttpComponents HttpClient
          Issue Type: Bug
    Affects Versions: 4.3.6
            Reporter: Tarik Yilmaz
            Priority: Critical

HttpClient client = HttpClients.createDefault();
HttpEntity entitiy = client.execute(new HttpGet(url)).getEntity();
String charset = ContentType.get(entity).getCharset().displayName();

third line throw an NullPointerException.

Response headers :
Date:Tue, 23 Dec 2014 14:06:13 GMT

Response meta tag :
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<html xmlns:fb="http://ogp.me/ns/fb#">

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-9" />
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1254" />
<link rel="SHORTCUT ICON" href="/favicon.ico" />

How can I receive real charset from DOM object. I am using Jsoup for parse document with Jsoup.parse(InputStream,
String, String) method.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

View raw message