cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject [RT] Can Cocoon help enforcing the "semantic web"?
Date Mon, 15 May 2000 17:25:37 GMT
Here we are for another episode of the "random thoughts" series brought
to you by Stefano's fried synapses.

In a recent message (also picked up by xmlhack.com) I expressed my
concerns about the harm that a tool like Cocoon could do to the ideas
that XML and friends propose. Many of you responded with strong
arguments about the need for XML operativity on the server side, and I
totally agree with them (of course), also it was pointed out that such
needs would not be removed by the existance of widespread XML-capable
clients, true, they would be reshaped, but never removed.

I've thought very much about this and came to the conclusion that while
Cocoon is _NOT_ harmful to the XML model in general, it leaves to the
user a very important part of the job to enforce the "semantic web".

During the last few days, I went over all the "web design issues" that
W3C director Tim Berners-Lee outlines (http://www.w3.org/DesignIssues/),
did my homework on RDF, RDF Schema and all related materials, read some
whitepapers about metadata activities and started to think on how Cocoon
could help.

I've come across many interesting ideas and powerful dreams of a "web of
knowledge" where a layer of machine understandable information is
processed to create a layer of human understandable information, but
generally easier to process by humans because already filtered by
metadata processing.

I know many of you don't know about RDF and many others believe it's
just the XML equivalent of the HTML <meta> tag. In general, RDF is
believed to be a useless waste of time. I used to think this myself, but
I think it's time to look forward... and outline the problems that RDF
and friends have.

There are problems: the baby is hard to understand and use. RDF is
generally verbose, it has this (please, allow me) "useless"
element/attribute equivalence (which breaks validation in all possible
ways), it's utterly abstracted and provides no example of use that would
pay off in the short term.

RDF is more than a year old and almost nobody (except the Mozilla
project) has been using it. Why?

I don't have a general answer but I have my own: why should I care about
embedding RDF markup in my documents, if nobody is able to use it?

But same thing could be said for RDF-based applications (the infamous
chicken & egg problem): why should I write an RDF-capable engine if
there is no content available which contains RDF?

Sure, there are RFC that teach you how to embed RDF into your HTML
(yeah, right... you wish), also RFC that teach you what metadata
elements to use (the dublin core), David Megginson also wrote an RDF
wrapper for SAX, everybody in this world knows that this might be
big....

.... but the energy gap to arrive to that usability plateau is _HUGE_
and it seems that nobody is able to write that "killer app" that makes
this ball spinning.

Can Cocoon be this "killer app"?

I strongly believe so. Let me explain why:

Cocoon (starting from its version 2.0) is based on the sitemap. The
sitemap is the location of all the processing information required to
generate a resource. This is metadata, this is "data about data". If we
clean it up a little, RDF-ize it, then it would be very easy for Cocoon
to expose its sitemap to semantic crawlers.

Also, thru the use of content negotiation, it could be possible for the
crawler to obtain the "RDF" information (which could be the original
one, or one created on purpose), which along with XLink/XBase/XPointer
would allow the crawler to crawl in a friendly manner the site.

Ok, you say, I get that, but what would be different from today?

The thing is that we are going to write that crawler and connect it
directly to Cocoon so we would gain:

1) The sitemap is the instruction for both the resource generator
processor and both the information semantic crawler. Single point of
management, but would also allow people to pay off instantly their
metadata effort.

2) Each Cocoon would have it's own semantically driven search engine.

3) Each Cocoon would connect to other semantic search engines which make
available RDF views of their information (the mozilla directory, for
example) to increase their action range.

4) Each Cocoon would be contacted by other agents (other Cocoon or
equivalently behaving) and provide RDF views of its information,
possibly already semantically processed to avoid the need of site
recrawling of that agent.

If you think about it, such "cellular" semantically-based indexing would
work much like Napster/Gnutella networks where there would be no central
point of failure.

Imagine a web where each site controls not only its information, its
schemas, its stylesheets and web applications, but also its own search
engine and everyone of them is the entry point for a distributed (but
locally manageable) semantically based searching enviornment.

It would work much like routing tables work for TCP/IP networks,
propagating information as soon as they are available or delegating
search and retrieval to other networks.

I don't know if this feasible or not, but the idea seems to me *very*
exiting, to say the least.

                 ----------- o -------------

But how would a "semantically based search engine" work?

I still don't have a clear view of this, but I have a few ideas to
share: first of all, the RDFSchema WD adds a great deal of functionality
to the RDF idea and makes it very appealing.

[Careful here: RDFSchema is not to be confused by XMLSchema which is a
totally different thing. RDFSchema is -NOT- the XMLSchema for RDF, also
because RDF cannot be validated]

RDFSchema provides mostly object-oriented capabilities to the RDF model,
allowing, for example to say

 <rdf:description
  about="http://www.apache.org/~stefano/rt/latest"
  xmlns="http://metadata.org/people/jobs">
  <dreamer>Stefano Mazzocchi</dreamer>
 </rdf:description>

where the namespace "http://metadata.org/people/jobs" indicates (with an
RDFSchema) that

  dreamer --(extend)--> dc:author

where dc:author indicates the author tag of the Dublin Core standard
metadata set, which indicates the author of the described resource.

Then, on the local site, since users generally are made aware of the
specific metadata tags used, "dreamer" might have other meanings, but
for other sites that are unaware of these site-specific meanings, they
can fall back on standard "author" tag since the semantic has been
inherited.

Think about something like this where you are able to define whatever
metadata markup is required for your needs, but you provide semantic
hooks for outter searches to still match.

More or less like standard API provide functionalities that you extend
as you please, but then they allow you to run your program on any
compatible platform, such a semantic web would be based on standard set
of metadata tags, then what you need to do (if you don't want to use
those tags, or what to provide special searching capabilities) is to
extend them and make the RDFSchemas accessible in a known place.

Would this solve all of us problems? no way, like for XMLSchemas, the
problem of web "balkanization" and fragmentation exists, but as many
outlined, stable points on a dynamic system happen on the bottom of a
bell-shaped surface.

Today, we have a stable dyanamic system, since its potential energy is
on a local minimum.

W3C is providing us ideas on an ideal new minimum of the web potential
energy that lies far higher, into another local minimum.

We need to behave as catalizers to lower the energy required to move
from this minimum (current web) to the other minimum high above
(semantic web), otherwise, these ideas will simply remain in those W3C
specs and will never change our favorite information infrastructure as
they fully deserve to do.

Cocoon was born to allow the adoption of XML technologies to solve real
life problems and acted as a catalizer.

Now I want to provide the same thing to complete the job.

Don't know when I'll have time for this, but I invite you to follow me
on this quest if you like the idea... and if you think you have a better
one, it's even better.

I know I'm crazy. I know... :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Mime
View raw message