hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject RE: HTTPClient 3.0-rc2 returning corrupt data through popular Proxy
Date Wed, 01 Jun 2005 18:23:02 GMT
Chris,

As far as HttpClient is concerned there's virtually no difference
whether a request goes via a proxy or hits the target server directly.
Feel free to look at the source code. The only difference, rather minor
in my opinion, is a slightly different the request line for non-
transparent requests, which can hardly result in data corruption.

Since the data corruption appears random my _guess_ is that there might
be a synchronization problem / race condition in your code, but take it
for what it is worth

Oleg


On Wed, 2005-06-01 at 11:02 -0700, Chris Fellows wrote:
> Thanks for the reply.
> 
> > (1) Is the problem reliably reproducible (hitting the same URL always
> > produces corrupted HTLM) or does it appear random?
> 
> No, it occurs at random.
> 
> > (2) Does this problem occur if you hit the URLs directly bypassing
> > Anonymizer.com proxy?
> 
> I did some more testing and yes, I have been able to reproduce the
> problem bypassing Anonymizer, though the frequency of corrupted
> responses is much higher using the proxy, about 1 in 100 using
> Anonymizer and 1 in 400 without Anonymizer. Using just java.net and
> Sockets and a non-proxy connection we never ran into any data
> corruption. I doubt its HTTPClient, as 3.0-rc2 from what I've seen has
> an excellent track record, but we do have to return 100% responses
> without any corruption. As a note, these requests only hit major search
> engines who actively try to prevent automated searches. I'll work on
> getting a compilable version of the HTTPClient code against one of the
> search engines with and without proxy to see if there's an
> implementation mistake.
> 
> 
> -----Original Message-----
> From: Oleg Kalnichevski [mailto:olegk@apache.org] 
> Sent: Tuesday, May 31, 2005 11:02 AM
> To: HttpClient User Discussion
> Subject: Re: HTTPClient 3.0-rc2 returning corrupt data through popular
> Proxy
> 
> 
> > 1)       Should HTTPClient 3.0 return data as well as any web browser?
> > 
> 
> Absolutely
> 
> > 2)       Has anyone run into similar problems with Proxy Services?
> > 
> 
> Not that we know of. I _suppose_ if there were such a fundamental
> problem with HttpClient we would have known.
> 
> 
> > 3)       Are there any fine tuning tips anyone has for using Proxies? 
> > 
> 
> There are not many. None of them is in any way related to data
> corruption
> 
> > 4)       Or tips for reading chunked data?
> > 
> 
> Chunk-encoding is FULLY transparent to the end user.
> 
> I looked at the source code attached and from the cursory observation
> nothing struck me as obviously wrong, but since this code is not
> compilable there is no sure way of telling for sure
> 
> (1) Is the problem reliably reproducible (hitting the same URL always
> produces corrupted HTLM) or does it appear random?
> 
> (2) Does this problem occur if you hit the URLs directly bypassing
> Anonymizer.com proxy?
> 
> Oleg
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Mime
View raw message