jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject UUIDs, URIs, handles and so forth
Date Sat, 09 Apr 2005 16:23:27 GMT
As all of you are probably aware of, the concept of 'identity' is a hard 
one to model and given that computers are not particularely clueful (nor 
fast) about semi-structures and parallel inferencing, people decided to 
identify by giving long enough numbers to things so that identity 
matching could become as easy as number equivalences.

Well, turns out that it's not that simple either, since unicity requires 
control, at least in a hierarchical way, to avoid two people to come up 
with the same ID for two different things.

So, naming authorities become, by design, the place where organizations 
clusterize, pretending they are the 'true representatives' of the 
identity problem.

The semantic web is nothing but a web of explicit statements about 
uniquely identifiable "things". Roy and TimBL disagree on what those 
"things" could be, but let's not go there here.

The base of identification on the web (and the semantic web, being its 
extension) are URIs.

But URIs are very general, sort of the XML of identifiers and have no 
provision, by themselves, to guarantee unicity: you need a naming 
authority to manage the namespace partitioning.

There are several of these:

  1) http     (web)
  2) DOI      (newspapers)
  3) LSID     (life sciences)
  4) handles  (digital libraries)

the main difference between the first and the other three is that the 
HTTP URIs could be used as URLs directly, without any other 
dereferencing. Problem is that, becuase of that, the naming authority 
that controls the space is the same exact one that controls the internet 
domain names... and those communities don't like that: they want their own.

For that reason, they are willing to pay the price of needing to 
'dereference' their URIs (which are, in fact, URNs).

Now, personally I believe that HTTP URIs are just fine; it's true that 
they pose a digital presevation problem because a semantically 
meaningful domain name is more subjective to change than a numeric one, 
but the problem is solved by modelling evolution of identifiers in time, 
rather then pretending that numbers last longer than strings.

But why am I writing here for that?

Well, JCR needs a way to uniquely identify things in the repository. So 
we have UUIDs for that.

JCR does not state explicitly that these identifiers need to be unique 
globally, there is no doubt that if they were globally unique, they 
would be locally unique *and* the content could be identifiable 
everywhere, even if two JCR repositories were to be merged.

In the evaluation of the creation of a JCR interface on top of DSpace 
(or in the use of JCR as the underlying repository API of DSpace!), the 
question of identifiers come up frequently.

DSpace normally uses 'handles' as identifiers and provides the ability 
to dereference a handle to the URL that locates it in that particular 
moment in time.

Sure it's easy to add a property 'handle' to a node and model it that 
way, but it would be way more elegant (and more future proof, since I'm 
already thinking about exporting that content as RDF, were the URIs are 
king) if it was possible for JCR to allow *you* to pick the UUID for the 
node.

So, my question is: is it possible to specify the node UUID externally? 
(as long as local unicity is guaranteed, obviously)

-- 
Stefano.


Mime
View raw message