cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Lundquist>
Subject [RT] Improved matching & selecting
Date Sun, 03 Dec 2006 21:27:32 GMT
Hi gang,

OK, here is a proposal for a way to reformulate matching and selecting 
so that they can be expressed in a more concise and powerful way.  It 
would (a) make sitemaps cleaner and easier to understand and write, (b) 
lower the learning curve for new users, and (c) can be implemented with 
a lot fewer classes than what we have today, allowing us to lose a 
bunch of source code.

ISSUES w/ current matching & selecting

1) The semantics of matching are just a special case of selecting, and 
<match> is just syntactic sugar for a <select> with a single 
alternative.  So we have a whole bunch of classes — those in 
o.a.c.matching — that exist only to support this syntactic sugar.

2) There are 3 styles of match used in the matchers and selectors: 
literal, wildcard, and RE.  So, now we have a combinatoric explosion of

	(match targets) x (3 match styles) x (matching vs. selecting).

with a class required to implement each combination.

3) Coverage of that matrix is incomplete.  For example, there is no 
selector for the request URI.  You could use a SimpleSelector to match 
{request:URI}, but that only supports the "literal" style of matching; 
if you need the wildcard or RE style, you're out of luck — there are no 
wildcard selectors at all, and RE selection is provided for some 
targets (e.g., request parameter) but not all.

4) Against this proliferation of Matcher and Selector classes, 
documentation is incomplete.  The Clever User knows to look in the 
source tree to find out what all components are actually available.  
That's not a good situation.

5) We have the o.a.c.matching.modular.* matchers.  In concept, they 
represent a better way, but (a) they're undocumented; (b) as it is, 
they just add to the big pile, and (c) their sitemap configuration 
syntax is cumbersome.

6) The bulk of the core Cocoon docs date from before the introduction 
of input modules.  Input modules should actually be more prominent in 
the documentation than they are.  Too many users still don't understand 
them, only finding out about them when looking for a recipe to solve 
some specific problem.  This is partly due to input modules' lack of 
primacy the docs, and it's also perpetuated by the glut of special 
matchers and selectors for (almost) everything.

7) Using different elements (<match> and <select>) and different 
components for the two forms obscures to the new user the fact that it 
really is only syntactic sugar!  That's confusing.  Matching vs. 
selecting is one more thing for us to have to explain: "Selecting is 
like matching, except blah blah blah".  I know it was confusing to me 
when I was first learning.  It's confusing because the newbie has the 
(correct) intuition that it's just syntactic sugar, but then the 
existence of a whole 'nuther tree of components seems to belie that.  
And then it turns out... nope, it really is just syntactic sugar!

8) The only way to express a logical "or" directly is to use RE 
matching, but (a) per [3] above, RE matching is not available for all 
targets; (b) users have asked for a way to do this for wildcard 
matching (and I have wanted it too).  It's a reasonable request (REs 
are harder to read and write) and a common use case, but we've denied 
them this based on the (valid) argument that we don't want to 
complicate the wildcard grammar, so "we keep wildcard simple, if you 
want 'or' branches then just use REs".  However, there is fortunately 
an easy way to provide "or" branches in matching/selecting that does 
not involve any change to the wildcard grammar!


1) Fix all of the above! :-)

2) ...with no loss of functionality (e.g., setting "Vary:" response 
header when necessary)...

3) ...and no negative performance impact.

PROPOSAL (the details...)

1) The functionality of matching would be subsumed into selection and 
be provided by the same class, since the difference is just syntactic 
sugar.  That gets rid of roughly half the classes right there, so we 
only have to refactor once instead of twice, and it takes care of one 
axis of the "combinatoric explosion", see below for the other...

2) This function, which is like today's "selecting", would be called 
"matching" and be provided by a Matcher interface unchanged from that 
of today (but I think it would only require one implementing class).  
It would be configured and invoked in the sitemap using the <match> 

Example time, explanations follow :-).  Here is the equivalent of what 
today would have to be specially written in Java as 

	<match value="{request:URI}">
		<when regexp="foo(bar|baz)">
		<when regexp="[^.]*\.blah">

3) The match target is given using sitemap variable syntax in the 
@value attribute.  This configures both the input module and the 
property key together without requiring separate attributes or 
component types, using a syntax that is unified with the rest of 
sitemap stuff.

4) The match method is specified by the name of the attribute that 
provides the match pattern.  Four methods would be supported:

	• "equals": literal match from first to last position, no magic 
	• "contains": match w/ leading or trailing characters
	• "path": a better name for today's "wildcard", i.e. the "*" and "**" 
we love to use!
	• "regexp": regular expression match.

Note that the match method is always manifest, it can never be left 
unstated, because it's not hard-coded into the implementation of some 
Matcher or Selector class.  E.g., if you are getting literal matching, 
you are getting it because you asked for it explicitly via "equals".

5) Naturally, you can mix match methods within a matcher, since there 
is no reason for all the <when> clauses to have to use the same match 

6) The shortcut form for single-alternative selection would not require 
a distinctive element in the sitemap language as it does today; we 
simply elide the <when> clause and hoist the match specifier attribute 
into the <match> element itself.  So, the example below is equivalent 
to an invocation of today's WildcardURIMatcher:

	<match value="{request:URI}" path="foobar/**/*">

7) There would be a globally configurable property to take the place of 
the local @value attribute.  To invoke a (non-default) configured 
instance, you use <match type="..."> just like today, but that is not 
any lighter syntactically than just using @value.  The real reason for 
this is to be able to configure a more specific default, e.g.

	<matchers default="uri">

	  <matcher name="general" class="...">  <!-- Note: the concrete class 
is always the same! -->

	  <matcher name="uri" class="...">
	  <matcher class="...">


This allows you to just write

	<match path="foobar/**/*">

which is nearly identical to today's <match pattern="foobar/**/*"> 
where <matchers default="wildcard">.  The tradeoff is that you then 
have to use <match type="general"> for anything else... but you have 
the choice if you want to do it that way.

8) Logical "or" branches are expressed outside of any pattern 
expression, by means of an element-driven form of the match specifier 
which has been shown until now only as an attribute.  So:

	<match value="{request:URI}">
		<when> <!-- matches any case below -->
			<call resource="X"/>	<!-- (e.g.) -->


• Less to learn.  You learn how to invoke an input module, and one 
place to go to see all the available input modules.  You learn how to 
use an IM with <match>, and that's it.
• Less to document
• Easier to explain
• More natural syntax
• More flexible and expressive
• More concise
• All combinations of match target and match method are automatically 
• Eliminates a bunch of source code, only a few classes required to 
• Eliminates all of the <selectors> and most of <matchers> sections 
from the sitemap.

• I see no way to make it back-compatible so that current sitemaps run 
w/o modification.  However, all of todays functionality is preserved, 
you just invoke it with different syntax.

So... WDYAT?


View raw message