hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad La Joie <laj...@itumi.biz>
Subject Re: Bytes Missing from HTTP Response
Date Mon, 04 Apr 2011 11:34:35 GMT
Yeah, unfortunately that didn't work.

Is there any way to get the old v3 behavior that gives you access to the
raw bytes of the entity before any sort of character decoding is done?

I strongly suspect that very few web servers out there are properly
configured to return the correct character encoding so this could
definitely be an ongoing problem.

On 4/2/11 6:29 AM, Oleg Kalnichevski wrote:
> On Sat, 2011-04-02 at 06:10 -0400, Chad La Joie wrote:
>> Okay, that makes sense.
>>
>> To test this, is there a way I can force the content type on the client
>> side, prior to requesting the response entity, via the response object?
>>
> 
> You can try adding Accept and / or Accept-Charset header to the request
> message and see if the origin server responds appropriately.
> 
> However, generally you might be better off using some sort of a content
> detection algorithm such that provided by Apache Tika toolkit. I suspect
> wget does exactly that.
> 
> http://tika.apache.org/0.9/detection.html
> http://tika.apache.org/
> 
> Oleg
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
> 
> 

-- 
Chad La Joie
http://itumi.biz
trusted identities, delivered

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message