Return-Path: Mailing-List: contact commons-httpclient-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list commons-httpclient-dev@jakarta.apache.org Received: (qmail 90578 invoked from network); 29 Jan 2003 20:12:00 -0000 Received: from smtpout.mac.com (17.250.248.97) by daedalus.apache.org with SMTP; 29 Jan 2003 20:12:00 -0000 Received: from asmtp02.mac.com (asmtp02-qfe3 [10.13.10.66]) by smtpout.mac.com (Xserve/MantshX 2.0) with ESMTP id h0TKC3Xc003240 for ; Wed, 29 Jan 2003 12:12:03 -0800 (PST) Received: from mac.com ([194.105.183.41]) by asmtp02.mac.com (Netscape Messaging Server 4.15) with ESMTP id H9HS4200.GP0 for ; Wed, 29 Jan 2003 12:12:02 -0800 Message-ID: <3E383540.60005@mac.com> Date: Wed, 29 Jan 2003 20:10:40 +0000 From: Mike Moran User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021212 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Commons HttpClient Project Subject: Relative URIs strike again Content-Type: multipart/mixed; boundary="------------010908010800020002040602" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --------------010908010800020002040602 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit I've been looking into an issue in some other, non HttpClient, code regarding relative URIs, and I was wondering how HttpClient handles it. Specifically, it is the following case: Base: http://www.foo.com/ Relative: # Now, I've only often seen this as a `fake' url for javascript popups, but nevertheless, it is not uncommon in the wild. I've attached an example, hash.html, to illucidate. It contains three relative links: "" (nothing), "#" and "#anchor". Now, assuming a base ref of "file:///hash.html", this is what I find: IE 5.0: "file:///" "file:///hash.html#" "file:///hash.html#anchor" Phoenix 0.5: "file:///hash.html" "file:///hash.html#" "file:///hash.html#anchor" My code: "file:///hash.html" "file:///hash.html" "file:///hash.html#anchor" I *suspect* from reading the HttpClient code, that it does the same as the last one, but I haven't got a working build here to test it. I don't find the relevant rfc, rfc2396 section 5.2, totally clear on what to do in the case of a fragment identifier just being "#". The regexp given in Appendix B seems to allow for it, ie the part (#(.*))? will match both "#" and "#anchor". I think the tricky bit that trips things up is the suggested way to reassamble the URI from its parts that ignores the fragment entirely if it only consists of "#". So, what way does HttpClient's URI class deal with this? Are IE and Phoenix/Mozilla both wrong? Answers on a postcard to ... -- Mike --------------010908010800020002040602--