tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: CrawlerSessionManagerValve question
Date Tue, 24 May 2011 12:51:16 GMT
Mark Thomas wrote:
> On 24/05/2011 12:50, Martin Kouba wrote:
>> What is the reason NOT to assume that request with more than one
>> User-Agent header originates from a bot?
>> See lines 133, 134 in Tomcat 7.0.14.
> 
> Simply that none of the samples I looked at had multiple UA headers and
> a suggestion from another committer that skipping those requests might
> be a way to save a few cycles.
> 
> If you have traces that show multiple headers, I'd be interested in
> seeing them.
> 

 From the RFC police :

RFC 2616, 4.2 Message Headers :

Multiple message-header fields with the same field-name MAY be present in a message if and

only if the entire field-value for that header field is defined as a comma-separated list

[i.e., #(values)].

(note the "if and only")

RFC 2616, 14.43 User-Agent

User-Agent     = "User-Agent" ":" 1*( product | comment )

(so *not* defined as '#(values)')


==> (my interpretation) : multiple User-Agent headers are invalid.

Discussion :

14.43 otherwise says :

The field can contain multiple product tokens (section 3.8) and comments identifying the 
agent and any subproducts which form a significant part of the user agent. By convention,

the product tokens are listed in order of their significance for identifying the application.

and 4.2 otherwise says :

It MUST be possible to combine the multiple header fields into one "field-name: 
field-value" pair, without changing the semantics of the message, by appending each 
subsequent field-value to the first, each separated by a comma. The order in which header

fields with the same field-name are received is therefore significant to the 
interpretation of the combined field value, and thus a proxy MUST NOT change the order of

these field values when a message is forwarded.

Thus, if one were to accept multiple User-Agent headers, and combine them as a 
comma-separated list, one would then have trouble respecting the "order of their 
significance" as expressed in 14.43.

So it makes sense to allow only one User-Agent header.

And maybe the "lines 133, 134 in Tomcat 7.0.14" should be modified to reject the request 
if it has more than one such ?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message