hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "droidin.net" <dr...@droidin.net>
Subject Weird characters in the stream
Date Tue, 18 Aug 2009 18:55:15 GMT

I'm trying to read a partial data from the HTML file. So I have this code
that returns me InputSource for my SAX parser
        InputSource is = null;
        HttpEntity entity = response.getEntity();
        if (entity == null) {
            final String body = new
BasicResponseHandler().handleResponse(response);
            is = new InputSource(new StringReader(body));
        } else {
            is = new InputSource(new InputStreamReader(entity.getContent(),
"utf-8"));
        }

And now comes the problem:
1. is = new InputSource(new StringReader(body)); // this always work
2. If I save HTML into file and then create InputSource from that using
is = new InputSource(new
InputStreamReader(ParserUtils.class.getResourceAsStream(testFile),
"utf-8"));
this also works
3. However if I do 
is = new InputSource(new InputStreamReader(entity.getContent(), "utf-8"));
Then my sax parser chokes with ArrayIndexOutOfBoundsException (Attempt to
access illegal array index) and when I look at the buffer it's full of
garbage chars that show up as little blank squares with char numeric value
of -1. If I wrap InputStreamReader into BufferedREader - that does not help.

The original HTML doc specifies
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
-- 
View this message in context: http://www.nabble.com/Weird-characters-in-the-stream-tp25031327p25031327.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message