Return-Path: Delivered-To: apmail-jakarta-httpclient-user-archive@www.apache.org Received: (qmail 75711 invoked from network); 1 Jun 2005 18:23:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Jun 2005 18:23:22 -0000 Received: (qmail 73035 invoked by uid 500); 1 Jun 2005 18:23:21 -0000 Delivered-To: apmail-jakarta-httpclient-user-archive@jakarta.apache.org Received: (qmail 72991 invoked by uid 500); 1 Jun 2005 18:23:20 -0000 Mailing-List: contact httpclient-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: "HttpClient User Discussion" Reply-To: "HttpClient User Discussion" Delivered-To: mailing list httpclient-user@jakarta.apache.org Received: (qmail 72966 invoked by uid 99); 1 Jun 2005 18:23:20 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from mail21.bluewin.ch (HELO mail21.bluewin.ch) (195.186.18.66) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 01 Jun 2005 11:23:18 -0700 Received: from [192.168.0.2] (83.77.32.74) by mail21.bluewin.ch (Bluewin 7.2.060.1) id 429D78C40002089F for httpclient-user@jakarta.apache.org; Wed, 1 Jun 2005 18:23:06 +0000 Subject: RE: HTTPClient 3.0-rc2 returning corrupt data through popular Proxy From: Oleg Kalnichevski To: HttpClient User Discussion In-Reply-To: References: Content-Type: text/plain Date: Wed, 01 Jun 2005 20:23:02 +0200 Message-Id: <1117650182.4546.7.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-4) Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Chris, As far as HttpClient is concerned there's virtually no difference whether a request goes via a proxy or hits the target server directly. Feel free to look at the source code. The only difference, rather minor in my opinion, is a slightly different the request line for non- transparent requests, which can hardly result in data corruption. Since the data corruption appears random my _guess_ is that there might be a synchronization problem / race condition in your code, but take it for what it is worth Oleg On Wed, 2005-06-01 at 11:02 -0700, Chris Fellows wrote: > Thanks for the reply. > > > (1) Is the problem reliably reproducible (hitting the same URL always > > produces corrupted HTLM) or does it appear random? > > No, it occurs at random. > > > (2) Does this problem occur if you hit the URLs directly bypassing > > Anonymizer.com proxy? > > I did some more testing and yes, I have been able to reproduce the > problem bypassing Anonymizer, though the frequency of corrupted > responses is much higher using the proxy, about 1 in 100 using > Anonymizer and 1 in 400 without Anonymizer. Using just java.net and > Sockets and a non-proxy connection we never ran into any data > corruption. I doubt its HTTPClient, as 3.0-rc2 from what I've seen has > an excellent track record, but we do have to return 100% responses > without any corruption. As a note, these requests only hit major search > engines who actively try to prevent automated searches. I'll work on > getting a compilable version of the HTTPClient code against one of the > search engines with and without proxy to see if there's an > implementation mistake. > > > -----Original Message----- > From: Oleg Kalnichevski [mailto:olegk@apache.org] > Sent: Tuesday, May 31, 2005 11:02 AM > To: HttpClient User Discussion > Subject: Re: HTTPClient 3.0-rc2 returning corrupt data through popular > Proxy > > > > 1) Should HTTPClient 3.0 return data as well as any web browser? > > > > Absolutely > > > 2) Has anyone run into similar problems with Proxy Services? > > > > Not that we know of. I _suppose_ if there were such a fundamental > problem with HttpClient we would have known. > > > > 3) Are there any fine tuning tips anyone has for using Proxies? > > > > There are not many. None of them is in any way related to data > corruption > > > 4) Or tips for reading chunked data? > > > > Chunk-encoding is FULLY transparent to the end user. > > I looked at the source code attached and from the cursory observation > nothing struck me as obviously wrong, but since this code is not > compilable there is no sure way of telling for sure > > (1) Is the problem reliably reproducible (hitting the same URL always > produces corrupted HTLM) or does it appear random? > > (2) Does this problem occur if you hit the URLs directly bypassing > Anonymizer.com proxy? > > Oleg > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: httpclient-user-help@jakarta.apache.org