cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [RT] Cocoon web applications
Date Sat, 06 Oct 2001 16:44:10 GMT
Berin Loritsch wrote:

> > > That is what I am referring to.  As of Servlet 2.3 and much debate, the
> > > official stance on where "/resource" maps you is to the web server root,
> > > not the context root.  Instead, the context root is much more difficult
> > > to reach.  Perhaps we can improve the HTML serializer to automagically
> > > correct context root resources.
> >
> > Yuck! I'd hate that. Serializers that mangle things behind your back are
> > the worst pain in the ass to find out expecially because you never look
> > at them since you normally consider them brainless and pure adapters
> > from the XML world to the binary world.
> >
> > Let's find a more elegant way.
> 
> OK, any ideas?

As I wrote in my previous email, this is the only thing that is left to
dicuss on the design of a component model for Cocoon webapps.

Let's analyze the needs for addressing:

1) it must allow strong contracts inside a component and between
different components
2) it must avoid name collisions or contract misunderstanding
3) it must be easy and immediate to use
4) it must be as least verbose/error-prone as possible
5) it must totally hide the hassle to lookup, discover or otherwise
access component instances.

I believe that the namespaced cocoon: protocol that I proposed in my
previous mail covers all points more or less decently:

 <element xmlns:webapp="http://apache.org/cocoon/webapp/2.3"
          src="cocoon://webapp/some/resource"/>

in fact

 1) the contract is rock solid as long as the internal URI structure of
the component remains solid. Versioning can be used to understand if the
structure is back compatible (minor is bigger than requested) or not
(major is bigger than requested).

 2) since instance lookup is hidden and done by the container with
information given by the deployer at deployment time (or subsequently at
container reconfiguration time), the CWA doesn't need to know anything
about this and concerns do not overlap.

 3) it's not extremely easy to use, but it's good enough for those who
understand namespace matching in XSLT.

 4) it reduces verbosity since the same namespace declaration can be
used thruout the entire document.

 5) as for point 2)

So far for addressing.

How is address resolution performed?

In this context address resolution means to translate an indirect
address of the above form into an absolute address. But there are two
differen behaviors depending on whether the information is consumed
internally or externally.

If consumed internally (for example, inside the sitemap for
aggregation), resolution means to transform the address from the
indirect form

 cocoon://webapp/some/resource

into an absolute internal one

 cocoon:/mount/point/of/webapp/some/resource

while, if referenced externally (for example, in an HTML page that is
sent to the browser), resolution means to transform into an absolute
external one

 protocol://host/path/to/container/mount/point/of/webapp/some/resource

or a relative external one relativized from the absolute external form
of the current resource that containes the link.

It's easy to discrimiate the two because the first behavior actually
invoques the cocoon: protocol handler, while the second does not (it's
just considered content from the pipelines).

So, while the first behavior is actually implemented inside the cocoon:
protocol handler, the second is more tricky.

My preferred solution would be to associate this behavior to "extended
xlinks" (which are my invention: are considered extended xlinks all
elements that contain an xlink:href attribute OR the src or href
attributes, look into org.apache.cocoon.xml.xlink for more details on
this) and perform a transparent "extended xlinks cocoon address
resolution" just before serialization (NOT by the serializer itself).

This has some major advantages:

 1) behavior is transparent to the user
 2) they don't have to specify it in the sitemap as a transformation
component (they might forget to add it and find it out is hard)
 3) it works on every serialized format (even PDF and such).

I can't think of anything more elegant than this that solves our
problems.

> > > Let me expound.  I like to use a dierectory structure like this:
> > >
> > > /xdocs
> > > /resources
> > >       /images
> > >       /scripts
> > >       /styles
> > > /stylesheets
> > >       /system
> > > /WEB-INF
> > >       /logicsheets
> > >       /cocoon.xconf
> > >       /logkit.xconf
> > > /sitemap.xmap
> > > /${sub.app}
> > >       /xdocs
> > >       /resources
> > >             /images
> > >             /scripts
> > >       /sitemap.xmap
> > >
> > > The problem is when I want a consistent look and feel in my ${sub.app}
> > > area.  I cannot access the /stylesheets that are accessible via the
> > > context--but not via the sitemap.  This requires me to copy the
> > > /stylesheets to the ${sub.app}.
> >
> > Ok, in this case, absolute URI would work and will not require you
> > access to your parent, but to an absolute location (which, in this case,
> > accidentally, happens to be your parent)
> >
> > This is a simple fix and we can schedule it for Cocoon 2.1 since it
> > might break back compatibility of sitemaps a little.
> 
> Sounds good.

Great.
 
> > > Because Cocoon is an XML framework, in order for this approach to work,
> > > you have to define the interfaces.  There are definite roles that I
> > > have already identified.  Some of the solutions come from concepts in
> > > SOAP, and some of the solutions come from concepts in JNDI, but here goes.
> > >
> > > For sub applications to work, you must have them work to a specific schema.
> > > (this concept is from SOAP).  For instance, your resource must return
> > > the results in DocBook format so that the parent knows how to apply views.
> > > This is the interface of your "component".
> >
> > I've already thought about this when I thought about a way to validate
> > sitemaps and it's a *LOT* more complex than this.
> >
> > Let's make an example: the "behavioral interfaces" of pipeline
> > components are the expected input namespaces and the resulting
> > namespaces. But listing them is not enough: you must know the exact
> > structure, thus the namespace-aware schemas.
> >
> > Even between components, schemas are the structure description that
> > identify the expected "shape" of the SAX pipe that connects two
> > components.
> >
> > Now, suppose you have a pipeline such as
> >
> >  <g] -> [t1] -> [t2] -> [s>
> >
> > and you have
> >
> >  g -> output schema of generator
> >  t1i -> input schema of first transformer
> >  t1o -> output schema of first transformer
> >  t2i -> input schema of second transformer
> >  t2o -> output schema of second transformer
> >  s -> input schema of serializer
> >
> > with all this information you can precisely estimate if the pipeline is
> > "valid", in a behavioral sense.
> >
> > This would allow you to perform some pretests on sitemaps (before
> > compilation and before uploading) that avoids those "impedence
> > mismatches" between connected components.
> 
> This is excellent--validation is vital!

Exactly. But even more: knowning the behavioral interfaces of pipeline
components allows for indirect creation of the pipeline.

For example, a sitemap indirect pseudocode might be:
 
 1) my generator creates stuff using schema G1
 2) have webapp "style" create a PDF out of it.

So, the sitemap looks up for the webapp "style" and asks for a pipeline
fragment (a transformer and a serializer in this case, but might be more
complex than this) that implement the behavior "from G1 to PDF".

If all pipeline components indicate their in/out schemas, simple
inference rules might be used to come up with those pipeline fragments
in a semi-automatic way, so if we have one transformer that goes from G1
to FO and a serializer that goes from FO to PDF, asking for G1 to PDF
might semi-automatically provide ways to assemble the pipeline.

I'm thinking about a graphical sitemap authoring tool: it might query a
CWA for particular pipeline fragments depending on in/out schema
behaviors and not only perform passive validation at the end.

> I know my practices, and I tend
> to use existing schemas, only inventing if necessary.  When I do invent
> a schema, I always have it generated by a logicsheet and provide a
> transformation to the main document schema.  This works for me, because
> it is a known environment.

I follow the same pattern (when possible), but still the XML world might
soon become so "babel-like" that it's hard to know the behavioral
interface of a stylesheet by simply looking at it or, even worse, by
reading its filename.
 
> What you are talking about is validating that not only I am doing my
> job right, but other people in my team don't make simple mistakes.
> The only thing is that the validation shouldn't be done in live serving.

Yes, such pipeline validation/assembly operation should be performed at
sitemap authoring and maybe at deployment time, but for sure this is too
heavy (and useless) to perform during reallime serving operation (just
like XML validation is useless on live sites but extremly useful when
debugging the development site).
 
> I think we do need to have schema validation on during development (esp.
> when designing new schemas) to ensure the app works, but have it off for
> deployment--something the deployment tool can ensure.

Agreed.
 
> > As more and more Cocoon components emerge and are made available even
> > outside the Cocoon distribution, the ability to estimate the "behavioral
> > match" between two components, will very likely be vital, expecially for
> > sitemap authoring tools.
> >
> > The algorithm that performs the validation is far from being trivial: a
> > sufficient condition (and the most simple one) requires the connecting
> > ends to be identified by the exact same schema.
> >
> > So, the above pipeline would be valid *if*
> >
> >  t1i == g
> >  t2i == t1o
> >  s == t2o
> >
> > but this is not a necessary condition since there exist cases where a
> > pipeline is behaviorally valid even if the two subsequent schemas don't
> > match completely, but only on parts.
> 
> Just to add a little more complexity to the system is now that we have
> namespaces, we have multiple schemas in one document.  Therefore, the
> transformation and serialization layers must be even more specific.

For "schemas", I intended XSchema (or equivalent) documents that
identify completely the structure of the class of documents they
represent. This automatically includes the namespace nesting rules, etc.
 
[... omitted real-life scenario ...]

> > But in this case, in order to be possible to continue the validation,
> > the output schema must state what can be left pass thru.
> 
> Not necessarily.  If you use my example above, the namespaces used are all
> declared in the generator.  To show the how the validator would work with all
> three schemas in use check this out:
> 
> Schematic ns: [schem]
> Location ns:  [loc]
> XHTML ns:     [xhtml]
> Any ns:       [*]
> 
> g[doc][schem][loc] ->
> t1i[*][schem]      ->
> t1o[doc][loc]      ->
> t2i[*][loc]        ->
> t2o[doc]           ->
> t3i[doc]           ->
> t3o[xhtml]         ->
> s
> 
> As you can see, the validator tracks the namespaces used at each OUTPUT point.
> This g, t1o, t2o, and t3o.  It is easy to track the document namespaces.  The
> big thing is that if a transformer or generator uses any intermediate namespaces
> during processing, it needs to clean up after itself.  For example, the esql
> logicsheet or SQLTransformer use a namespace to describe how pull information
> from a database--however none of that information is transfered in the document
> markup.  Currently, the generator calls the start and end namespace for the
> logicsheet/transformer, but no elements are passed using the namespace.  This
> presents added complexity to the validator.  We might be able to use the
> SAXConnector approach to strip the unnecessary namespace arguments.  That
> would require caching the SAX calls until the namespace is closed or the first
> element using the namespace is found.

As we are touching a *deep core* problem, we must be very formal and
abstract and analyze all the implicit assumptions. So this is why I have
to be picky on what you say above:

if you read again my previous statements and understand my terminology
that indicates that a "schema" contains all the necessary information on
how to mix different namespaces, you'll understand that you make an
implicit assumption in your analysis: the topological space defined by
input and output namespaces are translated but never rotated.

This far from being trivial to understand so let me expand over your
example updating the notation in a more readable (at least to me) form:

            g[doc][loc][schem] ->
 [*][schem]t1[doc][loc] ->
   [*][loc]t2[doc] ->
      [doc]t3[xhtml] ->

which indicates that

 t1 transforms [schem] to [doc][loc] and everything else is copied over
 t2 transforms [loc] to [doc] and everything else is copied over
 t3 transforms [doc] to [xhtml]

but this is far from being a general enough notation, because it
presumes that stylesheets work on namespaces orthogonally and don't mix
them.

While I agree this is a "good way" to design modular stylesheets, it's
not general enough, in fact it fails to describe a stylesheet that does:

 <xsl:template match="loc:element/doc:element">
  ...
 </xsl:template>

therefore assumes that the [doc] and [loc] namespaces are intermixed and
in a well defined order.

This is why I talked about general schemas and not lists of namespaces.
If DTDs didn't count namespaces, XSchemas do: you can define that a
document is valid if and only if it contains something of the form
loc:element/doc:element while not any other conbinations of the two.

So, in general, the true behavior of a stylesheet can be indicated by
the schema of input and the schema of output, which might be as simple
as placing n different namespaces without mixing them and simply
adapting one of the n into the others, but this is not general enough to
allow behavior validation of pipelines.

> > I don't want to get deeper into these details, but I just wanted to show
> > you that establishing behavioral composition on pipeline components is a
> > lot more complex than you described.
> >
> > But, yes, it can and needs to be done.
> >
> > > StreamResources: Take any source and goes completely through serialization.
> > >                  This is basically an alternate for Readers, although it
> > >                  can also be used for generated reports.
> > >
> > > FlowResources: A mounted flowmap that performs all the logic necessary for
> > >                a complex form.  It handles paging, etc.  It is a type of
> > >                compound resource in that it pools several simple resources
> > >                together, and returns the one we are concerned with at the
> > >                moment.
> > >
> > > URIMapResources: A compound resource that maps URIs to specific simple
> > >                  resources.
> > >
> > > SitemapResource: A compound resource that is a sub sitemap.  Sitemaps are
> > >                  completely self contained, so it is near impossible to
> > >                  override their results.
> >
> > I'm not sure about these, though. Could you give me some pseudo-example
> > of a pseudo-sitemap and how it would use the above?
> 
> My thinking on a StreamResource was that the sub cocoon app would completely
> handle that resource.  So whether that resource was a Reader or a full pipeline
> does not need to be known by the parent.
> 
> As to markup, I am not sure yet.  We need a conceptual model that works before
> we can express the markup.

I think I understand you and I think we are thinking, as usual, along
the same lines (influenced by Avalon design patterns, I presume)
 
> > > A sub application can specify resource adaptors for it's native XML generators,
> > > for instance you might have a document schema and a schema for an inbox.
> > > The If the parent has a View that recognizes the inbox schema, then it will
> > > directly use that schema.  If not, the sub application will specify a default
> > > mapping.
> > >
> > > Hopefully this is enough to get us started.
> >
> > I understand very well the concept of schema-based adaptation, but I
> > think I lost you on the other resources, I think a couple of dirty
> > examples will get me closer to your point.
> 
> Hopefully, I can model it in ASCII....
> 
> +--------------------+ get(stocking-section) +---------------------------+
> | Root Cocoon App    |---------------------->| Stocking Section App      |
> | schema: [doc][loc] |<----------------------| schema: [doc][loc][schem] |
> +--------------------+    rcv([doc][loc])    +---------------------------+
> 
> In the above "diagram", the root Cocoon app is designed to accept the
> [doc] and [loc] schemas (to carry on the previous examples), but has no
> knowledge of the [schem] schema.  The Stocking Section App is registered
> to output [doc], [loc], and [schem] schemas.  If the whole app is engineered
> to the [doc] schema (that being the target), Stocking Section App would
> provide adaptors for the [loc] and [schem] schemas to convert to the end
> [doc] schema.  If the parent app and the child app register the expected
> schemas with each other, the sitemap will return any schemas that can
> be handled natively.

Yes, it's very similar to what I was thinking, but this should NOT, IMO,
happen at sitemap level, but at pipeline assembly level, thus when
authoring and creating the webapp thru component composition.

Such authoting tool might well be a GUI version of Cocoon itself or
something else that connects to Cocoon for the deployment, I don't know,
but all this looking-up and discoveries should not happen on a live
site, IMO.
 
> IOW, Root registers [doc] and [loc] with Stocking.  Stocking configures
> itself so that it does not transform the [loc] schema--assuming that the
> parent knows how to handle it.  However, because the Root did not state
> that it could handle [schem] schema, Stocking applies the transformation
> for that.

I get the feeling that too much happens behind my back. I'd like to be
able to compose my pipelines in such an easy way, but I'd also would
like to be able to modify my sitemaps by hand when required (WYSIWIG has
drowbacks and we all know that very well, don't we?).

Also, how does this handle conflicts? what if there are two different
instances of a transformer that performs the same behavior [for example,
two stylesheets one fancy for nice graphics and one simple for text-like
browsers]? how should I choose between them?

Behavioral-only pipeline composition might lead to unwanted behavior and
would be very hard to understand what's wrong, expecially the more
namespaces and components are mixed and aggregated.

So, the best solution would be, IMO, to allow behavioral-driven
discovery of components in a sitemap editor and behavioral validation at
CWA deployment, while leaving the sitemap (or whatever comes next) as
static and explicit as possible, even if its readability is reduced.

I believe the path for easier Cocoon use passes also by creating aid
tools because, in fact, sometimes we are simply too powerful to simplify
the semantics of our configurations without sacrificing some of that
power.

This doesn't mean the sitemap is perfect and will never be touched, but,
IMO,  it should not sacrifice explicitness for ease of use, at least at
this level.
 
> > > > In short, you are asking for more solid and wiser contracts between web
> > > > applications and I believe that an absolute URI space accessing is
> > > > already a solid contract, but the proposed role-based addressing is a
> > > > killer since it allows strong contracts and still complete flexibility
> > > > and scalability.
> > >
> > > Yep. Well defined contracts reduce cognitive dissonence.  Too many contracts
> > > increase cognitive dissonence.
> >
> > Careful about using that term: "cognitive dissonance" is a good thing on
> > many situations since modern learning theories give it the role of
> > difference maker between short term and long term learning.
> >
> > In fact, they suggest that something gets learned only when there is
> > cognitive dissonance and your brain must work to overcome it, normally
> > by creating the abstraction that make it possible to make the two
> > cognitive concepts resonate and overlap with your existing semantic
> > environment.
> 
> See what they polute your minds with at school?  Keep in mind you are talking
> to someone with an Associates in the Recording Arts.  Psyche was not part of
> the lesson plan (however, psychoacoustics was...).

Hey, don't worry. :) I just scratched the surface myself on those issues
(even if I plan to dive into them deeply in the next months).
 
> I get your point though.

Good.
 
> > I'd love to continue research on this topic by letting practical things
> > like  real-life user experience as well as more theorical things like
> > cognitive science influence our decisions on how to make this project
> > evolve.
> 
> I have practical knowledge and real-life user experience.  I'll have to rely
> on your expertise for the cognitive sciences.  I know _some_ of the concepts
> because I have mentored others--but not nearly the detail you do.

Oh, believe me: there is so much to discover in this that I can wait to
start diving in. In fact, now that my technical studies are complete and
I know what happens when I click on my mouse to retrieve a web page
ranging from world-scale network architects to quantum behavior of light
and matter, I'm moving my attention on what's left: humans.

I've already done extensive research on psycoacustics myself but more
general things as cognitive science, visual semantics, color theory,
etc. are very likely to become my (and my girlfriend's) next research
field.

And you guys will get sick of me talking to you about how to apply to
cocoon what I will learn :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message