incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Seaborne (Commented) (JIRA)" <>
Subject [jira] [Commented] (JENA-170) hexBinary whitespace issue
Date Sun, 27 Nov 2011 21:15:39 GMT


Andy Seaborne commented on JENA-170:

The .sameValueAs() method tests value, .equals() tests identify of lexical form.  You call
the one you want.

Presumably Clerezza is calling .equals() or it's a persistent storage layer.  TDB doesn't
handle xsd:hexBinary as a value-based type.  It only handles numeric types, dates, dateTimes,
Gregorian dates.

And it is an RDF datatype.

RDF datatypes are declared in RDF/XML with rdf:datatype="....." -- an RDF mechanism, which
is open.  There isn't a fixed set of datatypes like XML Schema Datatypes.

XML datatypes use external declaration or xsi:type.  I believe that xsi:type can only refer
to an XSD datatype.
It only applies to XML.  

RDF isn't the XML document model and isn't necessarily in XML (c.f. Turtle).  There may be
other reasons the XML Schema datatype syntax was not applicable - I wasn't there at the time.
 Timing might be part of it - RDF finished Feb 2004, XQuery/Xpath data model is Jan 2007 with
earliest candidate rec Nov 2005.

SPARQL (and RDF by encouragement) uses the data model from XSD datatypes (lexical/value mapping),
but not the syntax.

Jena memory models do support a lot of value-based matching but this is costly.  They support
matching xsd:hexBinary by value if you call .sameValueAs; if you call .equals, you get strict
equality.  "001"^^xsd:integer and "1":;xsd:integer are, at the lowest level of the RDF abstract
data model different.  It could be an RDF datatype that has never been met before -- "IIII"^^my:roman
and "IV"^^my:roman.

Users ask that reading in and writing out data does not change the format; the memory model
keeps both forms around which is OK for numerics, but xsd:hexBinary can be large blobs, which
is unfortunate.

Canonicalization is a technique that emphasises the value at the expense of loosing different
forms in different places in the data.  A tradeoff.

Jena persistent storage layers don't keep both value and lexical form about.  Indexing does
not work.

Instead, TDB stores the value of numeric types, dates, dateTimes, Gregorian dates (in binary).
 It rebuilds nodes as their canonical form.
TDB does not do anything special for xsd:hexBinary, typically used a blobs so does not do
value-based matching, only lexical form matching.

It could be added - users also want round-trip of layout.

> hexBinary whitespace issue
> --------------------------
>                 Key: JENA-170
>                 URL:
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ, Jena, RDF/XML
>         Environment: 2.6.4
>            Reporter: Henry Story
>            Assignee: Andy Seaborne
>            Priority: Minor
> As I understand, initial and final white spaces in xsd:hexBinary in xml should be ignored
> because of the whitespace facet.
> With Jena 2.6.4 this is not the case, as shown by the test below. 
> I found that in Clerezza when using the graph api, so this is a problem even when one
does not use SPARQL.
> Removing the white space solves the proble. 
> xsd:hexBinary is already a very fragile encoding. Making it this fragile is bound to
lead to issues in communication.
> The same is true with the N3 encoding.
> -----------------------------------------------------------------
> hjs@bblfish[0]$ cat q1.sparql 
> PREFIX : <http://me.example/p#> 
> PREFIX xsd: <> 
>   ?S :related "AAAA"^^xsd:hexBinary .
> }
> hjs@bblfish[0]$ cat c1.rdf 
> <rdf:RDF xmlns="http://me.example/p#"
>     xmlns:rdf="">
>     <rdf:Description rdf:about="http://me.example/p#me">
>         <related rdf:datatype="">
> </related>
>     </rdf:Description>
> </rdf:RDF>
> hjs@bblfish[0]$ arq --query=q1.sparql --data=c1.rdf
> -----
> | S |
> =====
> -----

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message