incubator-sling-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [CONF] Apache Sling > Flexible Resource Resolution
Date Mon, 30 Nov 2009 11:28:00 GMT
    <base href="">
            <link rel="stylesheet" href="/confluence/s/1519/1/1/_/styles/combined.css?spaceKey=SLING&amp;forWysiwyg=true"
<body style="background-color: white" bgcolor="white">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
     <h2><a href="">Flexible
Resource Resolution</a></h2>
     <h4>Page <b>edited</b> by             <a href="">Felix
     <div class="notificationGreySide">
         <h1><a name="FlexibleResourceResolution-FlexibleResourceResolution"></a>Flexible
Resource Resolution</h1>

<p>Status: IMPLEMENTED<br/>
Created: 25. November 2008<br/>
Author: fmeschbe<br/>
JIRA: <a href="" rel="nofollow">SLING-249</a><br/>
References: <a href="" rel="nofollow"></a>,
<a href="" rel="nofollow">Mappings
for Resource Resolution</a><br/>
Update: 28. November 2008, fmeschbe, Addition to node types and add section on backwards compatibility</p>

    <li><a href='#FlexibleResourceResolution-Introduction'>Introduction</a></li>
    <li><a href='#FlexibleResourceResolution-Goals'>Goals</a></li>
    <li><a href='#FlexibleResourceResolution-JCREnvironment'>JCR Environment</a></li>
    <li><a href='#FlexibleResourceResolution-Properties'>Properties</a></li>
    <li><a href='#FlexibleResourceResolution-NodeTypes'>Node Types</a></li>
    <li><a href='#FlexibleResourceResolution-NamespaceMangling'>Namespace Mangling</a></li>
    <li><a href='#FlexibleResourceResolution-RootLevelMappings'>Root Level Mappings</a></li>
    <li><a href='#FlexibleResourceResolution-MappingEntrySpecification'>Mapping
Entry Specification</a></li>
    <li><a href='#FlexibleResourceResolution-RegularExpressionmatching'>Regular
Expression matching</a></li>
    <li><a href='#FlexibleResourceResolution-RedirectionValues'>Redirection Values</a></li>
    <li><a href='#FlexibleResourceResolution-ResourceTreeAccess'>Resource Tree
    <li><a href='#FlexibleResourceResolution-DrillingDowntheResourceTree'>Drilling
Down the Resource Tree</a></li>
    <li><a href='#FlexibleResourceResolution-CurrentStatus'>Current Status</a></li>

<h2><a name="FlexibleResourceResolution-Introduction"></a>Introduction</h2>

<p>Currently the resource resolver has a limited set of options to resolve incoming
requests to resources and to map resource paths to paths to be used in links in the response:</p>

	<li><b>Vanity URLs</b> &#8211; replace requested path with different
	<li><b>URL Mappings</b> &#8211; replace path prefixes with different
	<li><b>Regular Expressions</b> &#8211; regular expressions to resolve
and map paths</li>
	<li><b>VanityPath</b> &#8211; property set to an absolute path matched
against URL</li>

<p>To implement SLING-249 <a href=""
rel="nofollow">I initially proposed</a> to add another resolution and mapping option:</p>
	<li><b>VirtualHost</b> &#8211; add path prefix based on Host: header
(my proposal)</li>

<p>After an internal discussion <a href=""
rel="nofollow">I proposed a modified implementation</a>, which drops the existing
configuration in favor of full flexibiliy on all resources, by allowing a <tt>sling:vanityPath</tt>
to specify a relative or absolute path or even an URL. Absolute paths and URLs would be used
to resolve incoming requests to different locations in the resource tree.</p>

<p><a href="" rel="nofollow">Roy Fielding
noted</a>, that allowing any content node to define an absolute entry point for resource
resolution would be an open door to bypass delegated security and would also become unmanegeable
over time. In his message he proposed a slightly different approach, by specifying a location
in the repository, where such root entries would be configured. On the content level, only
local aliases are allowed to be defined.</p>

<p>This page is about trying to write up the fine print in this proposal and also document
the actual implementation.</p>

<h2><a name="FlexibleResourceResolution-Goals"></a>Goals</h2>

	<li>Separate concerns for root level mappings and local aliases. This allows administrators
to define root level mappings preventing regular authors from tampering and allows page authors
to define aliases for their pages.</li>
	<li>Allow providing different content trees for different virtual hosts while at the
same time allowing to share recources amongst all virtual hosts.</li>
	<li>Provide funtionality to externally and internally redirect. External redirects
are implemented by sending a 302/FOUND response with a different <tt>Location</tt>
to the client. Internal redirects are handled by just resolving a different actual resource
	<li>Allow authors to define alias names for their resources which may be used in URLs.</li>

<h2><a name="FlexibleResourceResolution-JCREnvironment"></a>JCR Environment</h2>

<h3><a name="FlexibleResourceResolution-Properties"></a>Properties</h3>

<p>When dealing with the new resource resolution we have a number of properties influencing
the process:</p>

	<li><tt>sling:match</tt> &#8211; This property when set on a node in
the <tt>/etc/map</tt> tree (see below) defines a partial regular expression which
is used instead of the node's name to match the incoming request. This property is only needed
if the regular expression includes characters which are not valid JCR name characters. The
list of invalid characters for JCR names is: /, :, <a href="/confluence/pages/createpage.action?spaceKey=SLING&amp;title=%2C&amp;linkCreation=true&amp;fromPageId=103164"
class="createlink">,</a>, *, ', ", | and any whitespace except blank space. In addition
a name without a name space may not be <tt>.</tt> or <tt>..</tt> and
a blank space is only allowed inside the name.</li>
	<li><tt>sling:redirect</tt> &#8211; This property when set on a node
in the <tt>/etc/map</tt> tree (see below) causes a redirect response to be sent
to the client, which causes the client to send in a new request with the modified location.
The value of this property is applied to the actual request and sent back as the value of
<tt>Location</tt> response header.</li>
	<li><tt>sling:status</tt> &#8211; This property defines the HTTP status
code sent to the client with the <tt>sling:redirect</tt> response. If this property
is not set, it defaults to 302 (Found). Other status codes supported are 300 (Multiple Choices),
301 (Moved Permanently), 303 (See Other), and 307 (Temporary Redirect).</li>
	<li><tt>sling:internalRedirect</tt> &#8211; This property when set
on a node in the <tt>/etc/map</tt> tree (see below) causes the current path to
be modified internally to continue with resource resoltion.</li>
	<li><tt>sling:alias</tt> &#8211; The property may be set on any resource
to indicate an alias name for the resource. For example the resource <tt>/content/visitors</tt>
may have the <tt>sling:alias</tt> property set to <tt>besucher</tt>
allowing the resource to be addressed in an URL as <tt>/content/besucher</tt>.</li>

<h3><a name="FlexibleResourceResolution-NodeTypes"></a>Node Types</h3>

<p>To ease with the definition of redirects and aliases, the following node types are

	<li><tt>sling:ResourceAlias</tt> &#8211; This mixin node type defines
the <tt>sling:alias</tt> property and may be attached to any node, which does
not otherwise allow setting a property named <tt>sling:alias</tt></li>
	<li><tt>sling:MappingSpec</tt> &#8211; This mixin node type defines
the <tt>sling:match</tt>, <tt>sling:redirect</tt>, <tt>sling:status</tt>,
and <tt>sling:internaleRedirect</tt> properties to define a matching and redirection
inside the <tt>/etc/map</tt> hierarchy.</li>
	<li><tt>sling:Mapping</tt> &#8211; Primary node type which may be used
to easily construct entries in the <tt>/etc/map</tt> tree. The node type extends
the <tt>sling:MappingSpec</tt> mixin node type to allow setting the required matching
and redirection. In addition the <tt>sling:Resource</tt> mixin node type is extended
to allow setting a resource type and the <tt>nt:hierarchyNode</tt> node type is
extended to allow locating nodes of this node type below <tt>nt:folder</tt> nodes.</li>

<p>Note, that these node types only help setting the properties. The implementation
itself only cares for the properties and their values and not for any of these node types.</p>

<h2><a name="FlexibleResourceResolution-NamespaceMangling"></a>Namespace

<p>There are systems accessing Sling, which have a hard time handling URLs containing
colons &#8211; <tt>:</tt> &#8211; in the path part correctly. Since URLs
produced and supported by Sling may colons because JCR Item based resources may be namespaced
(e.g. <tt>jcr:content</tt>), a special namespace mangling feature is built into
the <tt>ResourceResolver.resolve</tt> and <tt>ResourceResolver(map)</tt>

<p>Namespace mangling operates such, that any namespace prefix identified in resource
path to be mapped as an URL in the <tt>map</tt> methods is modfied such that the
prefix is enclosed in underscores and the colon removed.</p>

<p><em>Example</em>: The path <tt>/content/_a_sample/jcr:content/jcr:data.png</tt>
is modified by namespace mangling in the <tt>map</tt> method to get at <tt>/content/_a_sample/_jcr_content/_jcr_data.png</tt>.</p>

<p>Conversely the <tt>resolve</tt> methods must undo such namespace mangling
to get back at the resource path. This is simple done by modifying any path such that segments
starting with an underscore enclosed prefix are changed by removing the underscores and adding
a colon after the prefix. There is one catch, tough: Due to the way the SlingPostServlets
automatically generates names, there may be cases where the actual name would be matching
this mechanism. Therefore only prefixes are modified which are actually namespace prefixes.</p>

<p><em>Example</em>: The path <tt>/content/_a_sample/<em>jcr_content/_jcr_data.png</tt>
is modified by namespace mangling in the <tt>resolve</tt> method to get <tt>/content/_a_sample/jcr:content/jcr:data.png</tt>.
The prefix <tt>_a</em></tt> is not modified because there is no registered
namespace with prefix <tt>a</tt>. On the other hand the prefix <tt><em>jcr</em></tt>
is modified because there is of course a registered namespace with prefix <tt>jcr</tt>.</p>

<h2><a name="FlexibleResourceResolution-RootLevelMappings"></a>Root Level

<p>Root Level Mappings apply to the request at large including the scheme, host.port
and uri path. To accomplish this a path is constructed from the request as {<tt>scheme}/{host.port}/{uri_path</tt>}.
This string is then matched against mapping entries below <tt>/etc/map</tt> which
are structured in the content analogously. The longest matching entry string is used and the
replacement, that is the redirection property, is applied.</p>

<h3><a name="FlexibleResourceResolution-MappingEntrySpecification"></a>Mapping
Entry Specification</h3>

<p>Each entry in the mapping table is a regular expression, which is constructed from
the resource path below <tt>/etc/map</tt>. If any resource along the path has
a <tt>sling:match</tt> property, the respective value is used in the corresponding
segment instead of the resource name. Only resources either having a <tt>sling:redirect</tt>
or <tt>sling:internalRedirect</tt> property are used as table entries. Other resources
in the tree are just used to build the mapping structure.</p>


<p>Consider the following content</p>
<div class="preformatted panel" style="border-width: 1px;"><div class="preformattedContent
      +-- http
                +-- sling:redirect = ""
                +-- sling:internalRedirect = "/example"
                +-- sling:match = ".+\.example\.com\.80"
                +-- sling:redirect = ""
           +-- localhost_any
                +-- sling:match = "localhost\.\d*"
                +-- sling:internalRedirect = "/content"
                +-- cgi-bin
                     +-- sling:internalRedirect = "/scripts"
                +-- gateway
                     +-- sling:internalRedirect = "" 
                +-- (stories)
                     +-- sling:internalRedirect = "/anecdotes/$1" 

<p>This would define the following mapping entries:</p>

<table class='confluenceTable'><tbody>
<th class='confluenceTh'> Regular Expression </th>
<th class='confluenceTh'> Redirect </th>
<th class='confluenceTh'> Internal </th>
<th class='confluenceTh'> Description </th>
<td class='confluenceTd'> http/ </td>
<td class='confluenceTd'> <a href="" rel="nofollow"></a>
<td class='confluenceTd'> no </td>
<td class='confluenceTd'> Redirect all requests to the Second Level Domain to www </td>
<td class='confluenceTd'> http/ </td>
<td class='confluenceTd'> /example </td>
<td class='confluenceTd'> yes </td>
<td class='confluenceTd'> Prefix the URI paths of the requests sent to this domain with
the string <tt>/example</tt> </td>
<td class='confluenceTd'> http/.+\.example\.com\.80 </td>
<td class='confluenceTd'> <a href="" rel="nofollow"></a>
<td class='confluenceTd'> no </td>
<td class='confluenceTd'> Redirect all requests to sub domains to www. The actual regular
expression for the host.port segment is taken from the <tt>sling:match</tt> property.
<td class='confluenceTd'> http/localhost\.\d* </td>
<td class='confluenceTd'> /content </td>
<td class='confluenceTd'> yes </td>
<td class='confluenceTd'> Prefix the URI paths with <tt>/content</tt> for
requests to localhost, regardless of actual port the request was received on. This entry only
applies if the URI path does not start with <tt>/cgi-bin</tt>, <tt>gateway</tt>
or <tt>stories</tt> because there are longer match entries. The actual regular
expression for the host.port segment is taken from the <tt>sling:match</tt> property.
<td class='confluenceTd'> http/localhost\.\d*/cgi-bin </td>
<td class='confluenceTd'> /scripts </td>
<td class='confluenceTd'> yes </td>
<td class='confluenceTd'> Replace the <tt>/cgi-bin</tt> prefix in the URI
path with <tt>/scripts</tt> for requests to localhost, regardless of actual port
the request was received on. </td>
<td class='confluenceTd'> http/localhost\.\d*/gateway </td>
<td class='confluenceTd'> <a href="" rel="nofollow"></a>
<td class='confluenceTd'> yes </td>
<td class='confluenceTd'> Replace the <tt>/gateway</tt> prefix in the URI
path with <tt><a href="" rel="nofollow"></a></tt>
for requests to localhost, regardless of actual port the request was received on. </td>
<td class='confluenceTd'> http/localhost\.\d*/(stories) </td>
<td class='confluenceTd'> /anecdotes/stories </td>
<td class='confluenceTd'> yes </td>
<td class='confluenceTd'> Prepend the URI paths starting with <tt>/stories</tt>
with <tt>/anecdotes</tt> for requests to localhost, regardless of actual port
the request was received on. </td>

<h3><a name="FlexibleResourceResolution-RegularExpressionmatching"></a>Regular
Expression matching</h3>

<p>As said above the mapping entries are regular expressions which are matched against
path. As such these regular expressions may also contain capturing groups as shown in the
example above: <tt>http/localhost\.\d*/(stories)</tt>. After matching the path
against the regular expression, the replacement pattern is applied which allows references
back to the capturing groups.</p>

<p>To illustrate the matching and replacement is applied according to the following
pseudo code:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
<span class="code-object">String</span> path = request.getScheme + <span class="code-quote">"/"</span>
+ request.getServerName() + <span class="code-quote">"."</span> + request.getServerPort()
+ <span class="code-quote">"/"</span> + request.getPathInfo();
<span class="code-object">String</span> result = <span class="code-keyword">null</span>;
<span class="code-keyword">for</span> (MapEntry entry: mapEntries) {
    Matcher matcher = entry.pattern.matcher(path);
    <span class="code-keyword">if</span> (matcher.find()) {
        <span class="code-object">StringBuffer</span> buf = <span class="code-keyword">new</span>
<span class="code-object">StringBuffer</span>();
        matcher.appendReplacement(buf, entry.getRedirect());
        result = buf.toString();
        <span class="code-keyword">break</span>;

<p>At the end of the loop, <tt>result</tt> contains the mapped path or <tt>null</tt>
if no entry matches the request <tt>path</tt>.</p>

<p><b>NOTE:</b> Since the entries in the <tt>/etc/map</tt> are
also used to reverse map any resource paths to URLs, using regular expressions in the Root
Level Mappings prevent the respective entries from being used for reverse mappings. Therefor,
it is strongly recommended to not use regular expression matching, unless you have a strong

<h3><a name="FlexibleResourceResolution-RedirectionValues"></a>Redirection

<p>The result of matching the request path and getting the redirection is either a path
into the resource tree or another URL. If the result is an URL, it is converted into a path
again and matched against the mapping entries. This may be taking place repeatedly until an
absolute or relative path into the resource tree results.</p>

<p>The following pseudo code summarizes this behaviour:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
<span class="code-object">String</span> path = ....;
<span class="code-object">String</span> result = path;
<span class="code-keyword">do</span> {
    result = applyMapEntries(result);
} <span class="code-keyword">while</span> (isURL(result));

<p>As soon as the result of applying the map entries is an absolute or relative path
(or no more map entries match), Root Level Mapping terminates and the next step in resource
resolution, resource tree access, takes place.</p>

<h2><a name="FlexibleResourceResolution-ResourceTreeAccess"></a>Resource
Tree Access</h2>

<p>The result of Root Level Mapping is an absolute or relative path to a resource. If
the path is relative &#8211; e.g. <tt>myproject/docroot/sample.gif</tt> &#8211;
the resource resolver search path (<tt>ResourceResolver.getSearchPath()</tt> is
used to build absolute paths and resolve the resource. In this case the first resource found
is used. If the result of Root Level Mapping is an absolute path, the path is used as is.</p>

<p>Accessing the resource tree after applying the Root Level Mappings has four options:</p>

	<li>Check whether the path addresses a so called Star Resource. A Star Resource is
a resource whose path ends with or contains <tt>/*</tt>. Such resources are used
by the <tt>SlingPostServlet</tt> to create new content below an existing resource.
If the path after Root Level Mapping is absolute, it is made absolute by prepending the first
search path entry.</li>
	<li>Check whether the path exists in the repository. if the path is absolute, it is
tried directly. Otherwise the search path entries are prepended  to the path until a resource
is found or the search path is exhausted without finding a resource.</li>
	<li>Drill down the resource tree starting from the root, optionally using the search
path until a resource is found.</li>
	<li>If no resource can be resolved, a Missing Resource is returned.</li>

<h3><a name="FlexibleResourceResolution-DrillingDowntheResourceTree"></a>Drilling
Down the Resource Tree</h3>

<p>Drilling down the resource tree starts at the root and for each segement in the path
checks whether a child resource of the given name exists or not. If not, a child resource
is looked up, which has a <tt>sling:alias</tt> property whose value matches the
given name. If neither exists, the search is terminated and the resource cannot be resolved.</p>

<p>The following pseudo code shows this algorithm assuming the path is absolute:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
<span class="code-object">String</span> path = ...; <span class="code-comment">//
the absolute path
</span>Resource current = getResource(<span class="code-quote">"/"</span>);
<span class="code-object">String</span>[] segements = path.split(<span class="code-quote">"/"</span>);
<span class="code-keyword">for</span> (<span class="code-object">String</span>
segment: segments) {
    Resource child = getResource(current, segement);
    <span class="code-keyword">if</span> (child == <span class="code-keyword">null</span>)
        Iterator&lt;Resource&gt; children = listChildren(current);
        current = <span class="code-keyword">null</span>;
        <span class="code-keyword">while</span> (children.hasNext()) {
            child =;
            <span class="code-keyword">if</span> (segment.equals(getSlingAlias(child)))
                current = child;
                <span class="code-keyword">break</span>;
        <span class="code-keyword">if</span> (current == <span class="code-keyword">null</span>)
            <span class="code-comment">// fail
</span>            <span class="code-keyword">break</span>;
    } <span class="code-keyword">else</span> {
        current = child;

<h2><a name="FlexibleResourceResolution-CurrentStatus"></a>Current Status</h2>

<p>In <a href=";revision=720647" rel="nofollow">Revision
720647</a> of Sling trunk I have implemented a first shot at a new JCR resource resolver.
This resource resolver currently lives side-by-side with the old one and the JcrResourceResolverFactoryImpl
decides based on configuration (Configuration Admin configuration or framework property of
the name <tt></tt>) whether the new or the old resource resolver
is to be used. The default is to use the new resource resolver.</p>

<p>The new resource resolver currently has the following setup:</p>

	<li><b><tt>/etc/map</tt></b> &#8211; Mappings are read
from <tt>/etc/map</tt> as described above</li>
	<li><b>Vanity URLs</b> &#8211; Existing vanity URL configuration is
added as map entries, where the regular expression{{.<b>/.</b>}} as the prefix
to indicate scheme and host.port are ignored.</li>
	<li><b>URL Mappings</b> &#8211; Existing URL mapping configuration
added as map entries where multiple internal path mappings for the same external path are
collected into a single entry with multiple internal redirects. Again the scheme and host.port
are ignored that is the entries are prefixed with <tt>.<b>/.</b></tt>.</li>
	<li><b>Regular Expressions</b> &#8211; Regular expression mappings
are not supported like this anymore. These must be migrated manually to respective entries
in <tt>/etc/map</tt>. The main reason to have regular expression mapping was support
for namespace mangling (see above), which has been implemented differently (see below).</li>
	<li><b>VanityPath</b> &#8211; Existing <tt>sling:vanityPath</tt>
settings are loaded as map entries where the <tt>sling:vanityPath</tt> property
defines the regular expression (using the fixed string <tt>.<b>/.</b></tt>
to indicate that the entry applies for any scheme and any host.port. The path of the node
having the <tt>sling:vanityPath</tt> property is used as the redirect path. Finally
the <tt>sling:redirect</tt> property of the node is used to decided whether the
redirect is internal (property is <tt>false</tt> or missing) or external (property
is set to <tt>true</tt>).</li>
	<li><b>Namespace Mangling</b> &#8211; Namespace mangling as described
in the <em>Namespace Mangling</em> section above is implemented.</li>

     <div id="commentsSection" class="wiki-content pageSection">
       <div style="float: right;">
            <a href=""
class="grey">Change Notification Preferences</a>

       <a href="">View
       <a href="">View
       <a href=";showCommentArea=true#addcomment">Add

View raw message