xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ki-Nam Choi" <kc...@acm.org>
Subject RE: [vote] A native XML database project under Apache
Date Fri, 19 Oct 2001 18:36:30 GMT
+1

thanks,
KI

-----Original Message-----
From: Stefano Mazzocchi [mailto:stefano@apache.org]
Sent: Thursday, October 18, 2001 3:53 PM
To: Apache XML; Kimbro Staken
Subject: [vote] A native XML database project under Apache


Hi,

while the world of native XML databases is full of marketing hype and
promises, it is evident (for all those who tried) that mapping general
XML schemas to relational databases can be sometimes very painful and
not very efficient.

In fact, it is widely recognized from the database research community
that while well structured can be easily and efficiently mapped to a
relational database, less structured (often called semi-structured) data
is much more difficult to map.

Don't get me wrong: there are a number of way to store XML in a database
to add ACID properties to XML documents, but while this is a
straightforward process for very repeatitive and well structured schemas
(invoices, stock quotes, money transactions), it is not so for
semi-structured schemas such as DocBook, SVG or even XSLT.

I here you say: I use BLOBS and I'm fine with them. I'm sure you are,
but in all honesty, I'm not. And for a few reasons:

1) each documentation system requires a repository for document. This is
often called "content management system". Since publishing is going
toward replacing all content with an XML syntax (and we all love to see
that happening in full extend), we must consider that such a system will
require a persistent way to manage the content and a fast and efficient
way to query it.

If you use BLOBS you loose an efficient way to look into the blobs
themselves so you are doomed before you even start.

You can fragment the XML document into relational mapping to
semi-structured data (and remember that documentation is almost always
semi-structured!) but it can be shown that this is hard, very expensive
and might require (depending on the document schema) a very high number
of nested queries to translate even a very simple XPath expression.

Add complexities such as namespaces and the proposed XQL and you see
that a XQL -> SQL might well be possible but is clearly going to become
a nightmare to manage and very painful to optimize for efficiency.

The remaining solution is to create a specific solution that leaves
structured data to RDBMS (where they really shine, no question about it)
but moves semi-structured data over to a more specific and
algorithmically optimized system.

Note that while ODBMS were supposed to solve the problem of
semi-structured data, they, in fact, do not.

This is why we need a native XML DB solution with full support for
namespaced content, XPath and XQL for querying, RDF for metadata.

2) so, the content management system that everybody is crying out loud
for requires a storage solution and I believe that a native XML DB is
the way to go.

Also because:

3) if we ever want to get deeper into the semantic web (and I,
personally, want), we must forget well structured data. Vocabularies
such as RDF, RDFSchema, Topic Maps and the like are *not* going to be
easily mapped into relational databases and efficiently searched.

So, this is why I propose the creation of a project hosted here under
xml.apache.org to implement this effort.

Since it's generally very hard to bootstrap an open development
community without some code to start working on, I suggest to start this
project over the code that the dbXML guys are willing to donate to the
ASF in order to create such development community that can research and
implement in this new field and, by doing so, hopefully lead the way
reducing the marketing crap and the hype around this.

FYI, dbXML (www.dbxml.org) is an implementation of a native XML database
written in the Java language that is close to reaching its first final
release.

I've been talking to one of the community leaders (here copied) that
independently came out with my same conclusion and wanted to propose
dbXML for donation even before I expressed my intentions.

Also Sam Ruby has been subscribed to their development list watching
over them.

dbXML was created with the sponsor of a commercial entity called "dbXML
Group" which still exists but has no economic energy to continue its
development and the main developers are now working on the project
unpaid.

But I'd like something to be clear: I'm *NOT* proposing that Apache
takes over 'dbXML group' to save dbXML and continue its development. I'm
proposing that Apache creates a new project for the creation of a
production quality native XML database solution that implements existing
and future standards (and hopefully have the power to influence their
establishment) and that in order to help bootstrap the community, we
start with the current dbXML implementation which is going to be donated
to the ASF.

To show this and to avoid confusion with past releases and the "dbXML
group" commercial entity, the project is *NOT* going to be called Apache
dbXML, but rather something without acronims, in the spirit of
xml.apache.org.

Kimbro and I have been talking about "Apache BooBoo", but that is just
the first name that crossed my mind :) If you have better names, please,
let us discuss this publicly if the deal gets approuved.

Anyway, the dbXML folks are willing to donate the code, to change the
name as long as we give proper credit to "dbXML group" for having
bootstrapped and donated the code (as we do for IBM, Lotus, Sun and
others), and more than willing to help in both development, user
support, research, community and evangelization. In fact, if the deal is
accepted by this list, they are even willing to close down the site and
move everything overhere with the new name.

Let me finish by saying that I do not consider important what the actual
code implementation is (few, myself included, might not like some of
their architectural choices, such as the use of CORBA and Jaggernaut),
but I'm *NOT* asking for a vote on their _actual_ technological status,
I'm asking for a vote to create a community that can create, maintain
and show the power of a native XML DB solution.

It might takes years to have something solid enough to compete with big
commercial names, but it is important, IMO, for Apache to have something
to say even on this front by creating a community and attracting people
and their ideas.

In fact, the dbXML guys are willing to donate the code, but also very
happy about the possibility of a higher visibility that would bring more
people and more ideas into the design process that is going to happen
for their next major release.

So, people, I'm asking you to judge the idea to create a community,
rather than the current dbXML implementation which is only a way to give
to users the meat the look for in that area, but then attract them for
new development and further research.

Sorry for the long mail.

Please, place your vote.

Thanks.

Stefano.



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message