commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reto Gmür <r...@apache.org>
Subject Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)
Date Sat, 31 Jan 2015 18:55:50 GMT
Hi Peter,

Of your usecases the only one which might be an argument for exposing an
blank-node id is:

1. The same document parsed using the same parser implementation into
> the same graph may generate BlankNode objects that are .equals and if
> they are .equals the .hashCode must be the same.
>

For this usecase I assume the parser would recreate bnodes that are tied to
the target graph using the same internal ids on the second round. The graph
then recognize the BNodes as its owns and will not create new nodes.
What I question:

- Is the exposed identifier really needed for this? The parser seems to
know about the target graph, it could apply other means not to recreate
nodes.
- Does the usecase make sense? If the Bnodes added in the first parsing
round are now used in triples quite different than the original, should it
really be identified just because of the common history?
- Also: what is "the same document". Does this mean byte-wise identical or
just having the same location?
- The usecase certainly comes from a legitimate requirement if its about
avoiding duplication in the target graph, in many implementations if I
parse "[ rdf:type foaf:Person; foaf:name "Alice"]." twice into the same
graph I will end up having 4 triples in the graph. of course this graph is
a non-lean graph that could be reduced to two triples. The SPARLQ protocols
is carefully designed to allow implementations to avoid (and remove)
redundancy. I think also in a Java API there should be more generic
mechanism to be able to avoid and allow the backend to remove redundancy
rather than just addressing the situation when the same document is parsed
twice into the same graph.

Cheers,
Reto


On Wed, Jan 28, 2015 at 5:31 AM, Peter Ansell <ansell.peter@gmail.com>
wrote:

> Hi Stian and Reto,
>
> Blank nodes are hard to support within a single system. They are
> fairly close to unsustainable within a general system. However, within
> a system that has RDF-1.1 as its theoretical basis, the W3C spec
> defines the mapping functions that are necessary to define equivalence
> between graphs (but does not say how translation should work in
> practice). Hence the discussion and a long contract to come to
> agreement about something that is consistent with the W3C specs, but
> extends them where necessary to make them work across the JVM.
>
> Part of this issue is that while it is necessary to expose some
> internally unique information about the BlankNode, the concrete syntax
> (or the Java Object for intra-VM translation), may not have assigned
> any identifier to the BlankNode. N-Triples for instance must
> necessarily know about an identifier to serialise a Triple independent
> of the context of a Graph.
>
> Hence we are trying to converge on a method for consistently assigning
> labels to blank nodes based on the parser (sorry if the JVM wide local
> scope comment confused you, the local scope probably needs to be
> smaller than that, at either the individual document parse level or
> the Graph level).
>
> Some of the use cases that we are trying to support are:
>
> 1. The same document parsed using the same parser implementation into
> the same graph may generate BlankNode objects that are .equals and if
> they are .equals the .hashCode must be the same.
>
> 2. The same document parsed using the same parser implementation into
> two different graphs must generate BlankNode objects that are not
> .equals() and hopefully do not have the same .hashCode().
>
> 3. Two different documents parsed using the same parser implementation
> into the same graph must generate BlankNode objects that are not
> .equals() and have different .hashCode() results. This includes cases
> where the concrete syntax contained the same label for the blank node.
>
> 4. The same document parsed using different parser implementations
> into two different graphs must generate BlankNode objects that are not
> .equals() and hopefully do not have the same .hashCode().
>
> 5. Two different documents parsed using different parser
> implementations may be then transferred into the same graph and the
> BlankNode objects inside of the graph must not be .equals() if they
> came from different physical documents, even if the concrete syntax
> contained the same label for the blank node.
>
> Andy has also brought up the possibility of round-tripping in addition
> to those requirements. Ie, a BlankNode from one graph could be
> inserted into another graph, and after some time it should be possible
> to put it back into the first graph and have it operate as if it were
> not moved out. The current proposal doesn't allow for that and I am
> not sure what would be required for that to work.
>
> In addition, it is hoped that all of the objects in the system could
> be immutable within a graph.
>
> We have not discussed trimming graphs previously. I have never come at
> RDF with the requirement of being able to remove triples but I may
> have had a limited set of use cases. Is there a usecase for that
> automatic trimming that could not be easily satisfied using a rules
> engine, as any automatic removal of triples is outside of what I
> envisioned the scope of Commons RDF to be and it hasn't been brought
> up by any others. Even if in RDF theory there is some corner case
> where it is allowed for, it is not a general requirement and is not
> generally used or asked for in my experience.
>
> I am fairly ambivalent on the case for internalIdentifier being
> substitutable for .toString, but currently we need to work out a
> consistent way to identify the local scope, and it could be used in
> conjunction with either internalIdentifier or toString if both have
> the same contract in practice. What we are doing endeavouring to
> transfer BlankNodes between implementations inside of the JVM and keep
> their general identity (and round-tripping adds another level of
> difficulty on top of that). If we just rely on .toString then we may
> need to embed the local scope information into the resulting string,
> so the two pieces of information would be compressed into one, which
> may not be ideal in the end. In a broader sense, it would be great if
> the new Commons RDF API didn't enforce restrictions on .toString that
> already has consistent meanings in each of the implementations, and
> unique new methods give more flexibility there.
>
> Thanks,
>
> Peter
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message