cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ovidiu Predescu <>
Subject Re: [C2] [2.1-dev] proposed changes to the Source interface
Date Tue, 14 Aug 2001 21:04:16 GMT
Hi Carsten,

On Tue, 14 Aug 2001 13:07:08 +0200, "Carsten Ziegeler" <> wrote:

> > Ovidiu Predescu wrote:
> > 
> > I was looking at how the current Source interface is defined, and I
> > believe we need to separate things a little bit more. I badly need
> > this separation in one of the extensions to Cocoon I'm working on
> > (which I hope to present sometime early next month).
> > 
> Sounds interesting. Tell us more about your extensions !

The team I work on at HP is making use of Cocoon2 as a framework to
build and access Web Services. We are actively supporting and
promoting Cocoon2 within HP as the framework to be used for any XML
processing. We just had a beta release of our middleware product,
which among other things includes Cocoon2, which is based on 2.1-dev
as of June 18, 2001. We intend to integrate all the changes we make
back to the Cocoon2 main trunk.

As part of the changes we've done, we had a SOAP and UDDI
logicsheet. The logicsheets were built on top of an abstraction we
called XStream, which is an object that holds an XML content, very
similar with the Source abstraction. In fact XStream was built on top
of a re-factored Source interface, which at the time I started the
branch, was only a class.

XStream is a logicsheet, together with the supporting code, that
defines all sorts of operations, creation from an inline XML fragment,
transforming an XStream through a stylesheet (which is nothing else
than another XStream), etc. The SOAP logicsheet is implemented
directly using XStream objects, without having to use any client
library. It just collects the XML fragment specified in the XSP page,
creates an XStream object and invokes the SOAP server directly. The
XML response is packaged in an XStream object.

The XStream objects are accessible by name, just like variables in
programming languages. The scope of a variable can be global, session
or XSP page. You can refer to them anywhere in an XSP page and you can
obtain their representation either as a Java object, or as an XML
fragment which gets embedded in the generated document.

Here is how one uses this stuff, in a fictive SOAP piece of code:

<xstream:create name="message"/>
    <arg1><xsp-request:get-parameter name="a"/></arg1>
    <arg2><xsp-request:get-parameter name="b"/></arg2>

<xstream:create name="response">
  <soap:call href="some url">
      <xstream:get name="message" as="xml"/>

<xstream:create name="a transformation" href="context://some stylesheet"/>

<para>The result of adding the two numbers is
  <xstream:transform source="response" stylesheet="a transformation"/>

The soap:call above simply creates another XStream object that
collects the XML fragment specified as a child. It then posts the
message to the specified URL, and places the resulting XML in an
XStream named "response". The above example shows how one can create
an XStream given an URL, and make use of it to transform another

XStream is only one type of objects that can be created. We are
working on having arrays, and special objects to hold content which is
not XML. This is needed for having the ability to process SOAP with
attachments, but can also be used to write Web applications that deal
with POST data which contains files and other non-XML data.

I call this generic framework XScript; it is a framework for
manipulating arbitrary objects, but mostly with XML content, from
within XSP pages. We plan to make use of it to implement things like
ebXML, BizTalk and RNIF processors. These will transform Cocoon2 in a
framework to build Web Services, not only to access them.

> > There are three distinct things the Source interface deals with right
> > now:
> > 
> > a) the real input source, its last modified date, and content length
> > 
> > b) determining whether the source is a file, and obtaining the file
> > 
> > c) streaming the content of the source to a ContentHandler
> > 
> > d) the ability to refresh a Source
> > 
> > IMO the Source interface should deal only with a). Source should be an
> > abstraction for content, with no regard whether is a file or whether
> > it contains XML data.
> Yes, this is right. That was actually the intension of the source
> object. But by the time if was introduced the cocoon code used different
> ways of getting information from sources and it was very hard to unify
> them into a single Source object without redesigning some major parts.
> So this led actually to the current implementation.

Yes, and I think you did a great job of unifying all the approaches in
a single one.

> > By greping the sources really quick, I found that the only place that
> > uses the file characteristics of Source is in
> > DirectoryGenerator. However a simple workaround can be implemented, by
> > asking the Source for its system id, and determining from there the
> > type of the Source.
> Again correct, but I think that a isFile() method on the Source object
> is more convenient than testing the system id if it starts with the
> "file" protocol (and more performant).

XStream is actually implemented as a Source. I modified the 2.1-dev
(unfortunatelly in a way that's a bit incompatible with today's main
trunk) so that Source is just a simple interface, as described in the
original message.

If you look above, for an XStream object that's built from an inline
XML fragment, there isn't any File associated with it. The file
abstraction in fact makes no sense for it.

Another example where the file abstraction doesn't make much sense is
with data coming from the POST request. There isn't any file
associated with this data, yet that data can be considered a Source.

As you point out, the file abstraction may provide some performance
improvements over checking the system id. In this case we can
implement FileSource as another interface, that inherits from Source,
which provides the file abstraction:

interface FileSource extends Source
  public File getFile();

> > The functionality defined in c) is already provided by XMLFragment,
> > and I see no reason why we shouldn't use this instead. Also we should
> > make a separation between Sources that contain XML data, and those
> > that don't.
> Yes and no, it would be good to separate between XML and not XML, the 
> reason for the stream() method in the Source object was the idea that
> a source object is able to "generate" xml, even if the data is not XML,
> but e.g. html.

I agree, but this type of sources that could be represented as XML
should really implement the XMLSource interface.

> We could use the XMLFragment interface here as well, also we have to
> add a toSAX(XMLConsumer consumer) method.

Should we then have toSAX(XMLConsumer) part of XMLFragment?

> > As for point d), I'm not sure is good to assume that all the Sources
> > are mutable. I have actually come up with Source objects which are
> > imutable, and for them the refresh operation has no meaning.
> The refresh() method is currently very important for the reloading of
> the sitemap and the cocoon.xconf to detect changes. On the other hand
> the refresh() method is also meant as a reset() method, which means
> that you can call e.g. getInputStream() more than once. With e.g.
> an url connection this is only possible if you open a new connection
> before you can get an input stream for the second time, so refresh()
> is usefull here.
> I agree that refresh() might not be the right name for it.

I understand your point, and as you point out, in the general case a
Source cannot be modified. An XStream created from an inline XML
fragment, or a Source created from the POST data, are immutable
objects, once created cannot be modified. The refresh/reset interface
makes sense to objects whose actual data store modifies, like the case
of the configuration files.

I believe we should create a ModifiableSource interface for this kind
of objects, and remove the methods from the Source interface.

> > As a result, I propose to have Source be following interface:
> > 
> > public interface Source {
> >   /*** BTW, why use long and not Date? ***/
> >   long getLastModified();
> > 
> >   long getContentLength();
> > 
> >   public InputSource getInputSource() throws IOException;
> > 
> >   String getSystemId();
> > 
> >   /*** getInputStream() can be easily implemented as
> >   getInputSource().getByteStream(). ***/
> > }
> > 
> > Based on this, we can define XMLSource as:
> > 
> > public interface XMLSource extends Source, XMLFragment
> > {
> > }
> > 
> > and rename stream(ContentHandler) to toSAX(ContentHandler). The
> > stream(XMLConsumer) can be implemented based on the toSAX() method
> > easily.
> > 
> > The URLSource and SitemapSource can then become classes that implement
> > the XMLSource interface. The refresh method can be placed as a public
> > method in URLSource and SitemapSource, if they don't prove worth of
> > creating a new interface. I still need to look into this a little bit
> > more.
> > 
> The first problem I see here is: Who decides whether an object is xml
> source or simply source? A in my opinion more convenient method is that
> everything is at first a source object and you can ask it to give you
> its xml representation. The source object itself converts into xml
> if required.

This sounds like a good idea, it makes converting between plain
content to XML much easier. But then how do we find out whether the
object holds XML content or not? Could we have something like:

  public boolean isRepresentableAsXML();

in the Source interface?

So to conclude, this is how I see the interfaces re-factored:

public interface XMLFragment
  void toSAX(ContentHandler handler) throws SAXException;
  void toSAX(XMLConsumer consumer) throws SAXException;
  void toDOM(Node node) throws Exception;

public interface Source extends XMLFragment
  public long getLastModified();
  public long getContentLength();
  public InputSource getInputSource() throws IOException;
  public String getSystemId();
  public boolean isRepresentableAsXML();

public interface ModifiableSource extends Source
  public void refresh();

public interface FileSource extends Source
  public File getFile();

How does this look to you?

> > I'm willing to do all the refactoring work if you guys like the
> > approach. Please let me know so I can finish the work and post a patch
> > this week; next two weeks I'll be on vacation ;-)
> > 
> Hm, I am not quiet sure, these are indeed some good and fresh ideas
> on dealing with sources. As we have the beta 2 of cocoon 2.0 out,
> I think we shouldn't to such radical changes any more.
> We could discuss this for one of the next versions (2.1/3.0).

As the subject of the message says, I'm proposing these changes for
2.1. I would rather like to see them happening sooner, otherwise all
the work I do on our internal CVS tree will need to be merged again
with the changes on the main trunk. I already went through this hassle
trying to merge the (incompatible) changes you and I made to the
Source class, and I can tell you is a pain.

Ovidiu Predescu <> (inside HP's firewall only) (my SourceForge page) (GNU, Emacs, other stuff)

To unsubscribe, e-mail:
For additional commands, email:

View raw message