commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reto Gmür <>
Subject Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)
Date Tue, 03 Feb 2015 15:42:20 GMT
Hi Peter, Hi Andy,

I think the Commons RDF API should model the Abstract Syntax, and to quote
the spec "Blank node identifiers are *not* part of the RDF abstract
syntax". Of course if there are very important pragmatic reason to have
some identifiers in the API we can consider having them nevertheless.

So far I haven't seen any use case which would actually require exposed
identifier, apart from the questionable double parsing of the same document
use case. It causes however several questions and difficulties (a node no
longer being identical to itself after being added to a graph).

It is clear that while constructing graphs an implementation must not throw
away redundancy. The question is if the API should force implementation to
keep redundant information, even if triplestores typically keep such
information I think the API shouldn't force them to do so (at least not
without very compelling use cases).

Graphs are immutable both in the Abstract Syntax as well as in the
Semantics. Nevertheless in many situations we want to have mutable graphs.
This is why in clerezza we have MGraphs (this is Graph in the SVN common
proposal) and Graphs (this is ImmutableGraph in SVN). The distinction is
relevant for the definition of equals (and consequently of course of
hashCode): a mutable graph is equals to itself (or to an instance backed by
the same mutable backend graph) while an immutable graph is equals to
another if and only if they are isomorphic.

Should BNode be shareable across Graphs? The Abstract Syntax says that they
can be shared across the graphs of the same dataset, RDF Semantics also
mentions that BNodes can be shared across Graphs when they have the same


g1 = g.subGraph(c);
g2 = g.subGraph(!c);

g, g1 and g2 may share BNodes, so the following is true:


What we don't need:
- A mean for application to re-create identical BNodes (implementations may
however do so), if we have a pointer to the BNode that's fine, otherwise we
get existing BNode by accessing the triples in the graph.
- A (necessarily complex) mean to enforce BNode to be different when in
different context. What the context is might go beyond to what is visible
at the API level, an implementation may return two equal nodes in different
graphs because they are from the same dataset. On the other hand an
application might construct a graph by first creating various subgraphs
sharing a bnode and then creating the union of them. The API might provide
a g.skolemize():g method which returns a copy of the graph where no BNode
is identical to any BNode outside the graph.

So to summarize: An implementation may add internal identifier to the BNode
and decide when two objects are identical, but the application should not
see these identifiers and not be able to recreate identical BNodes, they
should instead just use the existing Node.


On Mon, Feb 2, 2015 at 10:01 PM, Andy Seaborne <> wrote:

> Hi Reto,
> There is a key point in this disussion that is worth pulling out.
> RDF has a data model and there is also an interpretation of the data model.
> The data model is one spec ("Concepts and Abstract Syntax") and the
> interpretation in another ("Semantics", also commonly referred to as the
> Model Theory).  There others semantics as well (RDFS, OWL-x etc).
> The Model Theory only reflects unchanging graphs.  "Lean graphs" are in
> the model theory.
> RDF data has to be built in the first place using the Data Model.
> This layering should be reflected in code.
> The commons rdf API should reflect the Data Model; how to build graphs how
> they can be manipulated.  Systems can then be built on top of that -
> including Clerezza leaning graphs, or owl:sameAs prototype chains (which
> interact with leaning) or RDFS inference or ...
> Does that work for you?
>         Andy
> On 31/01/15 20:45, Reto Gmür wrote:
>> Hi Andy,
>> This analogy might help (or not):
>>> There is a table with 4 metal spheres on it in a line across it.  Each
>>> sphere is exactly kind of material, the same mass, the same colour and
>>> the
>>> same shininess.  You can ask "is that sphere the same as that other one?"
>>> by pointing to two of them.  If you put them in a bag, shake the bag and
>>> take one out, you can't tell whether this chosen one is the same as the
>>> one
>>> that was on the right-hand end of the line.
>> How many Spheres will you be able to take out of the bag? I once made the
>> mistake (resulting in a bug) to assume identity of the indiscernible in
>> RDF, this is however not the case.
>> To have the sphere exist they need to be part of graph.
>> If this is the graph:
>> _:a p _:b.
>> _:b p _:c.
>> _:c p _:d.
>> _:d p _:e.
>> _:e p _:a.
>> We can put the 5 sphere to the bag and shake as much as we want we will
>> always have 5 spheres in the bag. Of course, as you say, if we take one
>> out
>> we can say which one it is. (But we don't care, we are happy having a
>> shiny
>> meta sphere in a circle  with 4 other spheres).
>> By contrast if this is the graph:
>> _:a rdf:type ex:Sphere.
>> _:b rdf:type ex:Sphere.
>> _:c rdf:type ex:Sphere.
>> _:d rdf:type ex:Sphere.
>> _:e rdf:type ex:Sphere.
>> When we open the bag we might just have one sphere. Which is fine, as the
>> above graph evaluates to true in any world where there is at least one
>> sphere.
>> So far for what RDF is concerned. Things are a bit different for the API
>> that allows creating the graph, in this situation we might actually be
>> pointing (or looking) at spheres and as long as we do so they should not
>> disappear. After having added the above 5 triples we might go on adding:
>> _:a rdf:type ex:Shiny.
>> _:b rdf:type ex:Heavy.
>> _:c rdf:type ex:Radiactive.
>> _:d rdf:type ex:Transparent.
>> _:e rdf:type ex:Whole.
>> In this case we should have 5 spheres described by 10 triples. Every well
>> behaving quality bag will give as back all 5 spheres[*].
>> In other words as long as I am looking at the spheres I might go on adding
>> things to theirs descriptions making them actually distinct spheres.
>> This distinction between being looking at (or pointing at) spheres and
>> just
>> having them in the bag is very straight forwardly (and imho elegantly)
>> modeled with the distinction of an object instance being reachable or not.
>> In the clerezza code and in the SVN commons proposal code along the
>> following lines will works as expected.
>> {a,b,c,d,e} is a set of 5 BlankNodes (i.e. we have 5 objects, no two of
>> them are equals).
>> g.add(a, RDF.type, EX.Sphere);
>> g.add(b, RDF.type, EX.Sphere);
>> g.add(c, RDF.type, EX.Sphere);
>> g.add(d, RDF.type, EX.Sphere);
>> g.add(e, RDF.type, EX.Sphere);
>> //if we save the graph here the backend might just store one triple
>> //but as long as wen can do the following
>> g.add(a, RDF.type, EX.Shiny);
>> g.add(b, RDF.type, EX.Heavy);
>> g.add(c, RDF.type, EX.Radiactive);
>> g.add(d, RDF.type, EX.Transparent);
>> g.add(e, RDF.type, EX.Whole);
>> //we will end up storing 10 triples
>> In the API proposal it is neither clear in which situations the backend
>> might remove redundant information (as new blanknodes object with the same
>> id might be created with the same factory) nor if the latter 5 add
>> invocations will indeed add triples with the same subject (as the
>> implementation might have created bnodes that differ from the original 5
>> as
>> these might originate from another graph, see my answer to Stian).
>> So basically I agree with everything you write in your email and don't see
>> any reason to expose the internal identifier in the API and have complex
>> identity criteria.
>> Cheers,
>> Reto
>>   [*] Well, yeah the problem of the metaphors: we will always get back 5
>> bnodes, even though the graph also evaluates to true in a universe with
>> just one shiny, heavy, radiactive, transparent and whole sphere.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message