jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Bologna" <alessandro.bolo...@gmail.com>
Subject Re: XML, SNS, and JCR
Date Tue, 01 Apr 2008 00:07:28 GMT
Thanks David,
if you say that's an interesting idea, then I may actaully start to believe
in it too ;).

Please see my answers inline below. And, since  I am at that, I need to fix
a couple of typos in my original message:

Where I wrote:

> so, for instance, i could write:
> *//people//my:employee[2]/my:name* as an XPATH expression for the Normal
> View to find my second employee,
> *//people//my:employee[@id='john.smith']/my:dob* to find when the employee
> (not the freelancer) with id john.smith was born

What I *really* meant was:

> so, for instance, i could write:
> *//people//my:employee[2]/my:name* as an XPATH expression for the Normal
> View to find my second employee,
> *//people//my:employee[@jcr:name='john.smith']/my:dob* to find when the
> employee (not the freelancer) with jcr:name john.smith was born

First of all, congratulations to the restful URLs that you mention in
> your post, which is something that i do not see very often. You
> may find that the URL mapping in Apache Sling [1] is very similar.

I know, I wish Sling had been announced a bit earlier... We already had our
first prototype out last summer and it was a bit too late to start with
Sling, but will certainly look for areas of synergy.

> As you mention the DocView is for round tripping arbitrary XML while
> the SysView is for round tripping arbitrary content. If I am not mistaken
> the "Normal View" would not allow either of the two, but would add a lot
> of value for an efficient way to deal with something that I would call
> "real-life JCR aware XML".

Well, you are right, and the intent is not really to round-trip, but to
provide a way to port existing XML applications to JCR, and to leverage the
JCR as a (very powerful) way to deal with extremely large XML structures.

As I mentioned already, in such paradigm, it's possible to restfully
transition from a node (representation) to another, thus effectively
navigate the repository either within its hierarchical structure, or through
references to other nodes (which can be expressed as paths or jcr
references), or with XPATH queries.

If a simple XSLT stylesheet is the user agent (but any other client
application that can use XML would do as well), it's trivial to process
virtually the entire repository, no matter how much large it is.

For instance, take this few lines of an hypothetical and oversimplified xslt
(2.0) stylesheet:

    <xsl:template match="/">
    <xsl:variable name="posts"
    <xsl:result-document href="
            <head><title>2007 posts</title></head>
                <xsl:apply-templates select="posts"/>

    <xsl:template match="headline">
        <li><xsl:value-of select="."/></li>

This alone, using a bit of restful GETting and PUTting, can create the index
page for my blog, post it on the URL I want, and it can run on another
server as well.

More in general, the use case is that of an organization with hundred of
thousand of XML documents that are organized in some sort of hierarchical
fashion, with reference or hyperlinks to each other, and that together form
a super-document that is really complex to manipulate with traditional
tools. Once you load them in the JCR, you can access all of them at once,
extract and transform what you want etc.

And, of course, the other intent is to mend the chasm between two worlds
before it becomes too large...

> I think it would be interesting to find out what the
> characteristics and limitations of such a view are both from an XML
> (import)
> and from JCR (export/query) perspective are. I assume we would end up
> with an the same limitations as the DocView from a JCR perspective
> and possibly with the limitation that the XML elements would have to
> match to pre-registered (possibly auto-defined & registered) node types.

The main limitation I can think of, when it comes to the Document View, is
that node type information is lost, and property arrays are a bit "squeezed"
in attributes. There may be other, of course, that I am not aware of. Are
there more?

To be honest, both limitations are not huge in my experience, even with the
traditional Document View, when you consider that the use case is to process
XML with JCR (and not vice versa).

If there are string sequences in attributes, they can very well become
arrays of properties if the corresponding nodetype says so, or stay as a
single string that happens to have some innocuous white spaces in there.
And, type information can be preserved as long a node types are defined
(maybe even importing the result of a XML Schema to CND conversion).

Now, with the Normal View, the advantage I would see is that I would not
need to rethink my document structure avoding SNS to be able to take
advantage of what the JCR offers (especially in terms of Java APIs). And my
existing XPATH queries would still work, but now they would work across
documents too.

This approach would allow both a content-first and a structure-first way of

Content first? Just load your XMLs with the option to create empty,
unstructured nodetypes for you, and finally get that comprehensive view you
could never get before.

Structure first? Grab that schema (the one that that you have not updated in
the last two years), update it, import it in the JCR and go happy.

Thanks again for the attention and your input.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message