Return-Path: Delivered-To: apmail-httpd-users-archive@www.apache.org Received: (qmail 38492 invoked from network); 8 Oct 2010 17:12:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Oct 2010 17:12:37 -0000 Received: (qmail 26861 invoked by uid 500); 8 Oct 2010 17:12:34 -0000 Delivered-To: apmail-httpd-users-archive@httpd.apache.org Received: (qmail 26801 invoked by uid 500); 8 Oct 2010 17:12:34 -0000 Mailing-List: contact users-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: users@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@httpd.apache.org Received: (qmail 26793 invoked by uid 99); 8 Oct 2010 17:12:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Oct 2010 17:12:34 +0000 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [207.126.144.113] (HELO eu1sys200aog102.obsmtp.com) (207.126.144.113) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 08 Oct 2010 17:12:26 +0000 Received: from source ([193.36.230.103]) by eu1sys200aob102.postini.com ([207.126.147.11]) with SMTP ID DSNKTK9Q5BC4M0EqxCI6Xez3gfQBbuo6Foht@postini.com; Fri, 08 Oct 2010 17:12:06 UTC Received: from blackex05.detica.com ([10.1.1.10]) by proxy02.detica.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 8 Oct 2010 18:12:01 +0100 Received: from uksrpblkexb01.detica.com ([10.1.1.38]) by blackex05.detica.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 8 Oct 2010 18:12:04 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CB670B.ED9C4784" Date: Fri, 8 Oct 2010 18:12:03 +0100 Message-ID: <28EDD663298F2047B82ED5F6694AC22B559F7F@uksrpblkexb01.detica.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Weak-etag not being handled correctly - bug found in source code Thread-Index: ActnC+08rKt6ZHarT9yUtv1FLbmTRQ== From: "Ben Cooper" To: X-OriginalArrivalTime: 08 Oct 2010 17:12:04.0274 (UTC) FILETIME=[EDD65920:01CB670B] Subject: [users@httpd] Weak-etag not being handled correctly - bug found in source code ------_=_NextPart_001_01CB670B.ED9C4784 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I've been investigating why weak eTags aren't being handled properly, specifically why we always get a status 200 when we include a weak eTag, and only get a status 304 when we delete the weak eTag and use the last modified date instead, or make the eTag strong. I've found the bug: it's because the W tag is upper case, and Apache seems to assume all headers are lower case. However, I don't know where the best place is to fix the bug, as discussed below. =20 =20 The bug is in the source code as follows:- =20 The response is evaluated in http_protocol in a function called ap_meets_conditions(), as follows http_protocol.c: v2.2.16, line 355 not_modified =3D ap_find_list_item(r->pool, if_nonematch, etag); =20 This line of code is executed for both strong and weak eTags for full Get requests (ie not Range requests, which are caught by a previous if-statement). The call to ap_find_list_item is used to compare the etag in the resource to the one in the request (held in a hash table pointed to by the char *if_nonematch) =20 Ap_find_list_item exits in the util.c file. util.c: v2.2.16, line 1372 good =3D good && (*pos++ =3D=3D apr_tolower(*ptr)); =20 The for-loop surrounding this line of code walks through the characters in the two strings, comparing them character-by-character. It maintains some state variables, such as in_qstr that tracks whether or not the pointer is currently within a quote. If it is within a quote, it compares the two characters directly; if not within quotes, it executes line 1372 that lower-cases *ptr value. This assumes that the *pos value is already lower case. However, the weak eTag is defined by protocol as beginning with the uppercase letter W, and it appears that uppercase W is indeed what is stored in the hash table pointed to by if_nonematch and pos. =20 =20 Testing this, my colleagues have created eTags with values such as=20 "W1234-123123" W/"1234-123123" w/"1234-123123" =20 We have found that provided the eTag is fully quoted, or preceded by a lowercase w, httpd works as we would expect and returns a 304 to a conditional Get. However, if the eTag is preceded by an uppercase W, as per the protocol, then httpd always returns 200. It appears as if the cache is not working, but that is not the problem - it is the comparison of the W that is failing and causing the behaviour. =20 A targeted fix is to lowercase if_nonematch just prior to calling ap_find_list_item(). Because if_nonematch is a pointer, this may have an undesirable side-effect elsewhere in the code, so a "safer" approach is first to copy the string into a local variable, then lowercase the W, and then call ap_find_list_item(). However, I wonder if a more natural home for the fix would be to make the W lowercase when the hash table is loaded? =20 =20 Could someone who knows the code much better than me take up this issue, decide the best location for a fix, and implement it? =20 Many thanks, Ben Cooper Please consider the environment before printing this email. This message should be regarded as confidential. If you have received thi= s email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard cop= y by an authorised signatory. The contents of this email may relate to d= ealings with other companies within the Detica Limited group of companies= =2E Detica Limited is registered in England under No: 1337451. Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, Eng= land. =0D ------_=_NextPart_001_01CB670B.ED9C4784 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Normal

I’ve been investigating why weak eTags = aren’t being handled properly, specifically why we always get a status 200 when = we include a weak eTag, and only get a status 304 when we delete the weak = eTag and use the last modified date instead, or make the eTag strong.  = I’ve found the bug: it’s because the W tag is upper case, and Apache = seems to assume all headers are lower case.  However, I don’t know = where the best place is to fix the bug, as discussed below.  =

 

The bug is in the source code as = follows:-

 

The response is evaluated in http_protocol in a = function called ap_meets_conditions(), as follows

http_protocol.c: v2.2.16, line = 355

not_modified =3D ap_find_list_item(r->pool, = if_nonematch, etag);

 

This line of code is executed for both strong and = weak eTags for full Get requests (ie not Range requests, = which are caught by a previous if-statement).  The call to ap_find_list_item = is used to compare the etag in the resource to the one in the request (held in a = hash table pointed to by the char *if_nonematch)

 

Ap_find_list_item exits in the util.c = file.

util.c:  v2.2.16, line = 1372

good =3D good && (*pos++ =3D=3D = apr_tolower(*ptr));

 

The for-loop surrounding this line of code walks = through the characters in the two strings, comparing them = character-by-character.  It maintains some state variables, such as in_qstr that tracks whether or = not the pointer is currently within a quote.  If it is within a quote, it = compares the two characters directly; if not within quotes, it executes line 1372 = that lower-cases *ptr value.  This assumes that the *pos value is = already lower case.  However, the weak eTag is defined by protocol as beginning = with the uppercase letter W, and it appears that uppercase W is indeed what is = stored in the hash table pointed to by if_nonematch and pos.  =

 

Testing this, my colleagues have created eTags with = values such as

“W1234-123123”

W/”1234-123123”

w/”1234-123123”

 

We have found that provided the eTag is fully quoted, = or preceded by a lowercase w, httpd works as we would expect and returns a = 304 to a conditional Get.  However, if the eTag is preceded by an = uppercase W, as per the protocol, then httpd always returns 200.  It appears as if = the cache is not working, but that is not the problem – it is the = comparison of the W that is failing and causing the = behaviour.

 

A targeted fix is to lowercase if_nonematch just = prior to calling ap_find_list_item().  Because if_nonematch is a pointer, = this may have an undesirable side-effect elsewhere in the code, so a = “safer” approach is first to copy the string into a local variable, then = lowercase the W, and then call ap_find_list_item().  However, I wonder if a more = natural home for the fix would be to make the W lowercase when the hash table is loaded? 

 

Could someone who knows the code much better than me = take up this issue, decide the best location for a fix, and implement it?  =

Many thanks,

Ben Cooper

Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received thi=
s email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard cop=
y by an authorised signatory.  The contents of this email may relate to d=
ealings with other companies within the Detica Limited group of companies=
=2E

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, Eng=
land.
=0D
------_=_NextPart_001_01CB670B.ED9C4784--