tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlos Pita" <>
Subject SimpleMapper2
Date Thu, 29 Jun 2000 15:58:33 GMT
Here are some thoughts about a mapping algorithm:

I will consider 3 algorithms:

1) The first is the one SimpleMapper1 uses. (*)
2) The second is the one that uses 2 hashes, the first for contexts and the second for servlets
inside the selected context. In other words, a 2 level tree with hashes in the nodes. The
root is a hash of contexts, its leaves are hashes of servlets.
3) The third is a tree with the element n of the path in the level n. The nodes could be hashes
or simply lists, because there will be only a few elements in each.
(*) About Craig McClanahan reply: I had no time to look the SimpleMapper1 code after I read
your email, but I think it isn't matching that way now. I agree about matching the largest
context path altough I'm not sure the servlet spec says something about that. See my analysis
of 1).

Besides matching a context and a servlet path, I will need that:

a) If there is another possible match besides the one selected, the context path should be
shorter (in number of path elements).
b) If there is another possible match with the context path of the one selected, the servlet
path should be shorter (in the same sense as in a).

Another thing to be considered is that the hash function for String is:

 s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]  where n is the number of elements in the string.
This doesn't seem cheap.

I will assume the following simplifications:

i) There is no special /servlet path.
ii) We are not doing extension mapping.

Suppose that the requestURI has n path elements.
I will start with an analysis for 1):
    SimpleMapper1 will try to match the requestURI against the big hash. Because the big hash
could be considered not being so big to scare that hash function, I think this would essentially
be a hash function evaluation plus a string comparison. This could result in a match (the
path info is probably short or inexistent so we will have a match in a few steps) but we don't
know if we already have the largest context path. A situation in which there is a number of
contexts with only 1 path element, and one or two contexts with two or three path elements
doesn't seem very uncommon. So we will probably have a one path element context at this point
and we will have to continue with the search just to be sure we wouldn't have had a match
with a larger context. Then, the last path element of the requestURI is truncated and we restart
the process. We will have to continue the search until we have a requestURI that is as large
as the largest context path we already found. In most cases this will be something like n-1
hash function evaluations plus string comparisons. The average length of the strings will
be something like length(requestURI)/2. This affects the hash function calculation and the
string comparison.
I will continue with 2):

    Suppose we know c, the largest context path length in this container (this is a cheap
number). We extract the first path element of the request URI, then we match it against the
contexts hash. Consider the same situation that before: there is a number of contexts with
only 1 path element, and one or two contexts with two or three path elements. If we have a
first match, we will extract the second path element of the request URI and concatenate it
to the first one (or something equivalent). We will try to match this new path. If we fail,
the first was the right context. If we success, it is likely that we have a context path as
large as c (c is 2 or 3, and I think I'm being prudent here), so we can stop the search. 
Then, we will try to match the remaining requestURI against the found context's servlet mappings
hash. We proceed in the same way as in 1). Again, the path info is probably short or inexistent
so we will have a match in a few steps and we could stop then because we won't have the context
path problem. From this I would say that we are doing less comparisons with shorter strings
(this means cheaper hash functions evaluations and string comparisons). I think that the hash
sizes aren't relevant because the same in 1) than in 2) they aren't big enough to scare the
String hash function and it's very likely that we will have a one-string bucket after its
application (it's only the "idea", I'm not depending in hashes being open or closed) in both

.....well, I think I should start working if I don't want to be fired, so I will left 3) for
another time. Sorry.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message