Return-Path: Delivered-To: apmail-gump-general-archive@www.apache.org Received: (qmail 78747 invoked from network); 16 Apr 2005 22:53:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Apr 2005 22:53:37 -0000 Received: (qmail 23654 invoked by uid 500); 16 Apr 2005 22:53:37 -0000 Delivered-To: apmail-gump-general-archive@gump.apache.org Received: (qmail 23626 invoked by uid 500); 16 Apr 2005 22:53:36 -0000 Mailing-List: contact general-help@gump.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Gump code and data" Reply-To: "Gump code and data" Delivered-To: mailing list general@gump.apache.org Received: (qmail 23613 invoked by uid 99); 16 Apr 2005 22:53:36 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from blossom.betaversion.org (HELO blossom.betaversion.org) (62.140.213.100) by apache.org (qpsmtpd/0.28) with ESMTP; Sat, 16 Apr 2005 15:53:35 -0700 Received: by blossom.betaversion.org (Postfix, from userid 101) id 84D591CB9B7; Sat, 16 Apr 2005 23:53:20 +0100 (BST) X-AntiVirus-Version: ClamAV 0.84rc1/833 X-AntiSpam-Version: SpamAssassin 3.0.2 X-AntiSpam-Status: No (score=1.8/limit=7.5) X-AntiSpam-Rules: rcvd_in_sorbs_dul, listed, rcvd_in_njabl_dul, listed Received: from [192.168.1.100] (h-68-166-235-146.cmbrmaor.dynamic.covad.net [68.166.235.146]) by blossom.betaversion.org (Postfix) with ESMTP id 8D58E1CB9B3 for ; Sat, 16 Apr 2005 23:53:19 +0100 (BST) Message-ID: <42619771.6020806@apache.org> Date: Sat, 16 Apr 2005 18:53:37 -0400 From: Stefano Mazzocchi Organization: Apache Software Foundation User-Agent: Mozilla Thunderbird 1.0.2 (Macintosh/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Gump code and data Subject: Re: RDF References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Leo Simons wrote: [snip] > So, ehm, no, I don't actually think it'll be a tremendous win. It'll bring > some huge benefits, but it'll incur a big cost as well. Simplicity loss. > > Or maybe not. I'm not exactly an expert here. We do have one of those around > I think. Hence: "Show me!" The way you deal with statements is a little different than the way you deal with objects. Objects have explicit semantics, as much as statements, but their relationships are not typed. Example, if you have the Module object and the Project object, you have to decide which way the link goes and the notion of "Module.projects" means, this is the list of projects this module contains. Problem is that this implicit modeling forces you to say decide the direction of the link, and, in case you want both, you have to model this explicitly and at update, you need to know where to change. In RDF, you don't have to do all that. If you have a bunch of statements ModuleA -(is_a)-> Module ProjectA -(is_a)-> Project ModuleA -(contains)-> ProjectA ProjectA -(has_name)-> "Cocoon"@en^string Build-20050415-343 -(is_a)-> Build Build-20050415-343 -(built)-> ProjectA Build-20050415-343 -(status)-> "failed"@en^string Build-20050415-343 -(depends)-> Build-20050415-234 ... and so on. It's basically a log of the things you come to know about stuff and this becomes your knowledge base. No structure, you don't need it, you just need to be careful about how you model things and this becomes natural and grows with you. No need to define the objects nor the schema before you know how complex your data is. Very incremental, very XP, fits nicely both in the lazyness mode and in the separation between data production and data consumption that we want to enforce in Gump3. Now, what about the data consumption side? Well, the data is in the triple store, so you need to query it. There are many different ways to do this, but two main categories: 1) via an API 2) via a query language depending on the triple store you use, you get a different API and/or query language. The API feels more natural, but can be less optimized by the triple store. For example (pseudocode) Get all modules: modules = getSubjects("is_a","Module"); Get all builds that failed: builds = model.getSubjects("is_a","Build"); foreach (build in builds): status = model.getObjects(build,"status") if (status == "failed"): failed_builds.add(build) you get the idea. But you could also so something like failed_builds = model.get("?x is_a Build where ?x status 'failed'") which is not that hard to get. Objects are just syntax sugar around SQL statements: you have to model your data first, then add it in. In RDF is the other way around, you pile up your data and the database follows you. Sure, the argument that objects are better than dealing with JDBC resultsets by hand stands, but making this a general rule could be turn out to be a mistake. The vision of RDF is data first, metadata later. The vision of relational databases is metadata first, data later. And the funny thing is that there is nothing in the relational model that suggests you that (in fact, RDF is nothing but an explicit relational model with globally unique identifiers) but the idea of building a database by creating a schema was driven by the vision that statical typing is good for you even if it locks you in (certanly is good for the query indexers, and performance is clearly not the best feature of a triple store nowadays) I find it somewhat ironic that you now code in a dynamically typed language (and, AFAIK, with good feelings about it) and you advocate that static typing of your data (object or SQL doesn't really matter) is better for you. I think RDF offers a better model, especially for something integrating data and metadata from different independent domains like Gump. But of course, I'm biased. -- Stefano. --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@gump.apache.org For additional commands, e-mail: general-help@gump.apache.org