commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Ansell <ansell.pe...@gmail.com>
Subject Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)
Date Wed, 28 Jan 2015 23:11:24 GMT
On 28 January 2015 at 20:53, Andy Seaborne <andy@apache.org> wrote:
> On 27/01/15 17:11, Stian Soiland-Reyes wrote:
>>
>> I agree that "local scope" should be clarified
>
>
> "local scope" is a piece of terminology used only for RDF syntax.  Once away
> from syntax, there is no "scope" to a blank node.
>
> It is described in:
> http://www.w3.org/TR/rdf11-concepts/#section-blank-nodes
>
> The scope is the file for the purposes of reading that file once.
>
> A bnode "_:a" is the same bNode everywhere in that file at the time it is
> read (i.e. parsed).
>
> If the file is read twice, "_:a" generates a different blank nodes.
>
> The only things you can do with blank nodes are:
>
> * Create a new one ("a fresh one"), different from every other blank node.

That is a restriction that many systems do not have right now. Many
systems have been designed without this restriction for non-anonymous
blank nodes (ie, ones that are labelled and can appear multiple time
at any point in the document) so that the system could parse very
large collections from concrete RDF syntaxes using a finite (possibly
a fixed, small) amount of memory. Although in theory BlankNodes are
being "picked from an infinite set" finite computers mean there have
to be some compromises from the mathematical model underpinning RDF.
In practice, some systems allow you to ask for a blank node object to
be created using entropy (string/bytes/etc.) that will create an
equivalent object at another point in time to the other instances of
the blank node, based on the concrete RDF syntax supporting
non-anonymous blank nodes that are not localised to a particular part
of the document as with anonymous blank nodes.

Both anonymous and non-anonymous blank nodes are going to have the
same features after parsing, the issue is just during parsing what to
do to make streaming possible.

> * See if it is the same as another (java's .equals) because all RDF terms
> are distinguishable [1].

One of the goals of Commons RDF is to agree effectively on how to to
provide interoperability between implementations, and a key part of
that is defining Java Object equality.

We haven't defined which levels the expectation of equivalence would
be applicable at yet but we could think about it at any of the
following levels:

* RDFTerm level : being able to send any RDFTerm into any API even if
the underlying implementation is different, and have it recognised as
equivalent to any other RDFTerm which from the users perspective was
.equals() when they sent it in.

* Triple level : being able to send any Triple into any API, even if
the underlying implementation is different, and have it recognised as
equivalent to any other Triple which from the users perspective was
.equals().

* Graph level : being able to send a Graph into an API, even if the
underlying implementation is different, and have operations inside of
the API consistent with equivalence that the user saw in the Graph.
Note, this doesn't require the Graph to be mutable, but it may require
taking a copy of the objects and discarding the reference to the Graph
which may then be garbage collected if the user doesn't keep a
reference to the Graph.

* Dataset level (ie, Named and Default Graphs in a Set) : being able
to send a Dataset into an API, even if the underlying implementation
is different, and have operations inside of the API consistent with
the users view. Similarly, the Dataset doesn't need to be mutable, a
copy may be taken by the implementation to do its operations.

I would personally expect all of the RDFTerm, Triple, and Graph levels
to be supported. We haven't gone as far as to create a Dataset API yet
so that is out of scope still. The RDFTerm level is of course the most
difficult to support and it may go out of scope for the BlankNode set,
although it is trivial to support for IRI and Literal so they may
still practically be directly interoperable.

> * Put them in triples and hence into graphs.
>
> That has the implications that they can be put into several datastructures.
>
> The description in the javadoc:
> """
> They are always locally scoped to the file or RDF store
> """
> is not right.  They are not scoped to the RDF store.
>
> The nearest concept is that one store must have created it in the first
> place but once created, the blank node is just a thing and a rather simple,
> boring thing at that.

If the same blank node reference is encountered within the same
document, either the parser or the store has to map them to objects
(could be the same object) that has both .equals() as true and has the
same .hashCode(). If the responsibility is on the parser then, for
non-anonymous blank nodes that are encountered in concrete syntaxes,
the objects that the parser creates need to be consistent across the
entire document. If the store is only able to create blank nodes as
boring unique individualised objects, then at some point the parser
itself will be forced to keep track of which Java Objects were created
for which non-anonymous blank nodes.

> This analogy might help (or not):
>
> There is a table with 4 metal spheres on it in a line across it.  Each
> sphere is exactly kind of material, the same mass, the same colour and the
> same shininess.  You can ask "is that sphere the same as that other one?" by
> pointing to two of them.  If you put them in a bag, shake the bag and take
> one out, you can't tell whether this chosen one is the same as the one that
> was on the right-hand end of the line.

I disagree. In particular, if you studied the ball you put in closely
enough you may, in non-trivial situations, find something that was
unique about it in the context of the table/bag, even if it was only
in reference to other lines (ie, other Triples/Quads that were not all
made up of opaque blank nodes balls). If the bag was created from a
single document, then there either needs to be a way to identify which
balls, (at least when attached using predicates to other balls/lines,
ie, IRIs/Literals/Triples/Quads) are equivalent, or alternatively, if
you don't need to support fixed memory streaming of arbitrary length
RDF concrete documents, there needs to be a hard restriction on the
balls being unique Java Objects and the base Object.equals() would be
the equivalency.

>         Andy
>
> [1] http://www.w3.org/TR/rdf11-concepts/#section-rdf-graph
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message