Mailing-List: contact general-help@gump.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Gump code and data" <general@gump.apache.org>
Received-SPF: neutral (hermes.apache.org: local policy)
Message-ID: <42619771.6020806@apache.org>
Date: Sat, 16 Apr 2005 18:53:37 -0400
From: Stefano Mazzocchi <stefano@apache.org>
Organization: Apache Software Foundation
User-Agent: Mozilla Thunderbird 1.0.2 (Macintosh/20050317)
MIME-Version: 1.0
To: Gump code and data <general@gump.apache.org>
Subject: Re: RDF
References: <BE87565D.264F4%mail@leosimons.com>
In-Reply-To: <BE87565D.264F4%mail@leosimons.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Leo Simons wrote:

[snip]

> So, ehm, no, I don't actually think it'll be a tremendous win. It'll bring
> some huge benefits, but it'll incur a big cost as well. Simplicity loss.
> 
> Or maybe not. I'm not exactly an expert here. We do have one of those around
> I think. Hence: "Show me!"

The way you deal with statements is a little different than the way you 
deal with objects. Objects have explicit semantics, as much as 
statements, but their relationships are not typed.

Example, if you have the Module object and the Project object, you have 
to decide which way the link goes and the notion of "Module.projects" 
means, this is the list of projects this module contains.

Problem is that this implicit modeling forces you to say decide the 
direction of the link, and, in case you want both, you have to model 
this explicitly and at update, you need to know where to change.

In RDF, you don't have to do all that. If you have a bunch of statements

  ModuleA -(is_a)-> Module
  ProjectA -(is_a)-> Project
  ModuleA -(contains)-> ProjectA
  ProjectA -(has_name)-> "Cocoon"@en^string
  Build-20050415-343 -(is_a)-> Build
  Build-20050415-343 -(built)-> ProjectA
  Build-20050415-343 -(status)-> "failed"@en^string
  Build-20050415-343 -(depends)-> Build-20050415-234
  ...

and so on. It's basically a log of the things you come to know about 
stuff and this becomes your knowledge base. No structure, you don't need 
it, you just need to be careful about how you model things and this 
becomes natural and grows with you. No need to define the objects nor 
the schema before you know how complex your data is.

Very incremental, very XP, fits nicely both in the lazyness mode and in 
the separation between data production and data consumption that we want 
to enforce in Gump3.

Now, what about the data consumption side?

Well, the data is in the triple store, so you need to query it. There 
are many different ways to do this, but two main categories:

  1) via an API
  2) via a query language

depending on the triple store you use, you get a different API and/or 
query language. The API feels more natural, but can be less optimized by 
the triple store.

For example (pseudocode)

Get all modules:
  modules = getSubjects("is_a","Module");

Get all builds that failed:
  builds = model.getSubjects("is_a","Build");
  foreach (build in builds):
	status = model.getObjects(build,"status")
	if (status == "failed"):
		failed_builds.add(build)

you get the idea.

But you could also so something like

  failed_builds = model.get("?x is_a Build where ?x status 'failed'")
	
which is not that hard to get.

Objects are just syntax sugar around SQL statements: you have to model 
your data first, then add it in. In RDF is the other way around, you 
pile up your data and the database follows you.

Sure, the argument that objects are better than dealing with JDBC 
resultsets by hand stands, but making this a general rule could be turn 
out to be a mistake.

The vision of RDF is data first, metadata later. The vision of 
relational databases is metadata first, data later.

And the funny thing is that there is nothing in the relational model 
that suggests you that (in fact, RDF is nothing but an explicit 
relational model with globally unique identifiers) but the idea of 
building a database by creating a schema was driven by the vision that 
statical typing is good for you even if it locks you in (certanly is 
good for the query indexers, and performance is clearly not the best 
feature of a triple store nowadays)

I find it somewhat ironic that you now code in a dynamically typed 
language (and, AFAIK, with good feelings about it) and you advocate that 
static typing of your data (object or SQL doesn't really matter) is 
better for you.

I think RDF offers a better model, especially for something integrating 
data and metadata from different independent domains like Gump.

But of course, I'm biased.

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org