incubator-sling-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Sling Website > Output Rewriting Pipelines (org.apache.sling.rewriter)
Date Wed, 16 Sep 2009 06:06:00 GMT
<html>
<head>
    <base href="http://cwiki.apache.org/confluence">
            <link rel="stylesheet" href="/confluence/s/1519/1/1/_/styles/combined.css?spaceKey=SLINGxSITE&amp;forWysiwyg=true"
type="text/css">
    </head>
<body style="background-color: white" bgcolor="white">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
     <h2><a href="http://cwiki.apache.org/confluence/display/SLINGxSITE/Output+Rewriting+Pipelines+%28org.apache.sling.rewriter%29">Output
Rewriting Pipelines (org.apache.sling.rewriter)</a></h2>
     <h4>Page <b>edited</b> by             <a href="http://cwiki.apache.org/confluence/display/~cziegeler@apache.org">Carsten
Ziegeler</a>
    </h4>
     
          <br/>
     <div class="notificationGreySide">
         <h1><a name="OutputRewritingPipelines%28org.apache.sling.rewriter%29-ApacheSlingRewriter"></a>Apache
Sling Rewriter</h1>

<p>The Apache Sling Rewriter is a module for rewriting the output generated by a usual
Sling rendering process. Some possible use cases include rewriting or checking all links in
an html page, manipulating the html page, or using the generated output as the base for further
transformation. An example of futher transformation is to use XSLT to transform rendered XML
to some output format like HTML or XSL:FO for generating PDF.</p>

<p>For supporting these use cases, the rewriter uses the concept for a processor. The
processor is a component that is injected through a servlet filter into the response. By implementing
the <em>Processor</em> interface one is able to rewrite the whole response in
one go. A more convenient way of processing the output is by using a so called pipeline; the
Apache Sling rewriter basically uses the same concept as the famous Apache Cocoon: an XML
based pipeline for further post processing of the output. The pipeline is based on SAX events.</p>

<h2><a name="OutputRewritingPipelines%28org.apache.sling.rewriter%29-SAXPipelines"></a>SAX
Pipelines</h2>
<p>The rewriter allows to configure a pipeline for post processing of the generated
response. Depending on how the pipeline is assembled the rewriting process might buffer the
whole output in order to do proper post processing - for example this is required if an HTML
response is "transformed" to XHTML or if XSLT is used to process the response.</p>

<p>As the pipeline is based on SAX events, there needs to be a component that generates
these events and sends them through the pipeline. By default the Sling rendering scripts write
to an output stream, so there is a need to parse this output and generate the SAX events.</p>

<p>The first component in the pipeline generating the initial SAX events is called a
generator. The generator gets the output from Sling, generates SAX events (XML), and streams
these events into the pipeline. The counterpart of the generator is the serializer which builds
the end of the pipeline. The serializer collects all incomming SAX events, transforms them
into the required response by writing into output stream of the response.</p>

<p>Between the generator and the serializer so called transformers can be placed in
a chain. A transformer receives SAX events from the previous component in the pipeline and
sends SAX events to the next component in the pipeline. A transformer can remove events, change
events, add events or just pass on the events.</p>

<p>Sling contains a default pipeline which is executed for all html responses: it starts
with an html generator, parsing the html output and sending events into the pipeline. A html
serializer collects all events and serializes the output. </p>

<p>The pipelines can be configured in the repository as a child node of <em>/apps/APPNAME/config/rewriter</em>
(or <em>/libs/APPNAME/config/rewriter</em>). (In fact the configured search paths
of the resource resolver are observed.) Each node can have the following properties:</p>
<ul>
	<li>generatorType - the type of the generator (required)</li>
	<li>transformerTypes (multi value string) - the types of the transformers (optional)</li>
	<li>serializerType - the type of the serializer (required)</li>
	<li>paths (multi value string) - the paths this pipeline should run on (content paths)</li>
	<li>contentTypes (multi value string) - the content types this pipeline should be used
for (optional)</li>
	<li>extensions (multi value string) - the extensions this pipeline should be used for
(optional)</li>
	<li>resourceTypes (multi value string) - the resource types this pipeline should be
used for (optional)</li>
	<li>order (long) - the configurations are sorted by this order, order must be higher
or equal to 0. The configuration with the highest order is tried first.</li>
	<li>enabled (boolean) - Is this configuration active? (default yes)</li>
</ul>


<p>As you can see from the configuration there are several possibilities to define when
a pipeline should be used for a response, like paths, extensions, content types, or resource
types. It is possible to specify several of them at once. In this case all conditions must
be met.</p>

<p>If a component needs a configuration, the configuration is stored in a child node
which name is <em>{componentType}<del>{name}</em>, e.g. to configure the
html generator (named <em>html-generator</em>), the node should have the name
<em>generator-html-generator</em>. In the case that the pipeline contains the
same transformer several times, the configuration child node should have the formant <em>{componentType}</del>{index}</em>
where index is the index of the transformer starting with 1. For example if you have a pipeline
with the following transformers, xslt, html-cleaner, xslt, link-checker, then the configuration
nodes should be named <em>transformer-1</em> (for the first xslt), <em>transformer-html-cleaner</em>,
<em>transformer-3</em> (for the second xslt), and <em>transformer-link-checker</em>.</p>


<h3><a name="OutputRewritingPipelines%28org.apache.sling.rewriter%29-DefaultPipeline"></a>Default
Pipeline</h3>

<p>The default pipeline is configured for the <em>text/html</em> mime type
and the <em>html</em> extensions and consists of the <em>html-generator</em>
as the generator, and the <em>html-serializer</em> for generating the final response.<br/>
As the html generated by Sling is not required to be valid XHTML, the html parser is using
an HTML parser to create valid SAX events. In order to perform this, the generator needs to
buffer the whole response first.</p>

<h2><a name="OutputRewritingPipelines%28org.apache.sling.rewriter%29-ImplementingPipelineComponents"></a>Implementing
Pipeline Components</h2>

<p>Each pipeline component type has a corresponding Java interface (Generator, Transformer,
and Serializer) together with a factory interface (GeneratorFactory, TransformerFactory, and
SerializerFactory). When implementing such a component, both interfaces need to be implemented.
The factory has only one method which creates a new instance of that type for the current
request. The factory has to be registered as a service. For example if you're using the Maven
SCR plugin, it looks like this:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
@scr.component metatype=<span class="code-quote">"no"</span> 
@scr.service <span class="code-keyword">interface</span>=<span class="code-quote">"TransformerFactory"</span>
@scr.property value=<span class="code-quote">"pipeline.type"</span> value=<span
class="code-quote">"validator"</span>
</pre>
</div></div>

<p>The factory needs to implement the according interface and should be registered as
a service for this factory interface (this is a plain service and not a factory service in
the OSGi sense). Each factory gets a unique name through the <em>pipeline.type</em>
property. The pipeline configuration in the repository just references this unique name (like
validator).</p>

<h2><a name="OutputRewritingPipelines%28org.apache.sling.rewriter%29-ExtendingthePipeline"></a>Extending
the Pipeline</h2>
<p>With the possibilities from above, it is possible to define new pipelines and add
custom components to the pipeline. However, in some cases it is required to just add a custom
transformer to the existing pipeline. Therefore the rewriting can be configured with pre and
post transformers that are simply added to each configured pipeline. This allows a more flexible
way of customizing the pipeline without changing/adding a configuration in the repository.</p>

<p>The approach here is nearly the same. A transformer factory needs to be implemented,
but instead of giving this factory a unique name, this factory is marked as a global factory:</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
@scr.component metatype=<span class="code-quote">"no"</span>
@scr.service <span class="code-keyword">interface</span>=<span class="code-quote">"TransformerFactory"</span>
@scr.property name=<span class="code-quote">"pipeline.mode"</span> value=<span
class="code-quote">"global"</span>
@scr.property name=<span class="code-quote">"service.ranking"</span> value=<span
class="code-quote">"RANKING"</span> type=<span class="code-quote">"<span
class="code-object">Integer</span>"</span>
</pre>
</div></div>
<p><em>RANKING</em> is an integer value (don't forget the type attribute
otherwise the ranking is interpreted as zero!) specifying where to add the transformer in
the pipeline. If the value is less than zero the transformer is added at the beginning of
the pipeline right after the generator. If the ranking is equal or higher as zero, the transformer
is added at the end of the pipeline before the serializer.</p>

<p>The <em>TransformerFactory</em> interface has just one method which returns
a new transformer instance. If you plan to use other services in your transformer you might
declare the references on the factory and pass in the instances into the newly created transformer.</p>


<h2><a name="OutputRewritingPipelines%28org.apache.sling.rewriter%29-ImplementingaProcessor"></a>Implementing
a Processor</h2>
<p>A processor must conform to the Java interface <em>org.apache.sling.rewriter.Processor</em>.
It gets initializd (method <em>init</em>) with the <em>ProcessingContext</em>.
This context contains all necessary information for the current request (especially the output
writer to write the rewritten content to).<br/>
The <em>getWriter</em> method should return a writer where the output is written
to. When the output is written or an error occured <em>finished</em> is called.</p>

<p>Like the pipeline components a processor is generated by a factory which has to be
registered as a service factory, like this:</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
@scr.component metatype=<span class="code-quote">"no"</span> 
@scr.service <span class="code-keyword">interface</span>=<span class="code-quote">"ProcessorFactory"</span>
@scr.property value=<span class="code-quote">"pipeline.type"</span> value=<span
class="code-quote">"uniqueName"</span>
</pre>
</div></div>

<h2><a name="OutputRewritingPipelines%28org.apache.sling.rewriter%29-ConfiguringaProcessor"></a>Configuring
a Processor</h2>
<p>The processors can be configured in the repository as a child node of <em>/apps/APPNAME/config/rewriter</em>
(or libs or any configured search path). Each node can have the following properties:</p>
<ul>
	<li>processorType - the type of the processor (required) - this is the part from the
scr factory information after the slash (in the example above this is <em>uniqueName</em>)</li>
	<li>paths (multi value string) - the paths this processor should run on (content paths)</li>
	<li>contentTypes (multi value string) - the content types this processor should be
used for (optional)</li>
	<li>extensions (multi value string) - the extensions this pipeline should be used for
(optional)</li>
	<li>resourceTypes (multi value string) - the resource types this pipeline should be
used for (optional)</li>
	<li>order (long) - the configurations are sorted by this order, order must be higher
or equal to 0. The configuration with the highest order is tried first.</li>
	<li>enabled (boolean) - Is this configuration active? (default yes)</li>
</ul>


     </div>
     <div id="commentsSection" class="wiki-content pageSection">
       <div style="float: right;">
            <a href="http://cwiki.apache.org/confluence/users/viewnotifications.action"
class="grey">Change Notification Preferences</a>
       </div>

       <a href="http://cwiki.apache.org/confluence/display/SLINGxSITE/Output+Rewriting+Pipelines+%28org.apache.sling.rewriter%29">View
Online</a>
       |
       <a href="http://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=115997&revisedVersion=10&originalVersion=9">View
Change</a>
              |
       <a href="http://cwiki.apache.org/confluence/display/SLINGxSITE/Output+Rewriting+Pipelines+%28org.apache.sling.rewriter%29?showComments=true&amp;showCommentArea=true#addcomment">Add
Comment</a>
            </div>
</div>
</div>
</div>
</div>
</body>
</html>

Mime
View raw message