incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: TDB Literal Canonicalization
Date Tue, 16 Aug 2011 16:57:12 GMT

On 14/08/11 22:05, Ian Emmons wrote:
> Andy,
> Sorry about the attachments.  I'm not sure why they were eaten.  I've
> pasted the two files into the email body below, along with the
> output.
> I'm afraid that as soon as I retried my test program (with a couple
> of minor changes) in light of your advice, I was unable to duplicate
> the behavior that I thought I had observed.  Rather, I found
> different, but still puzzling behavior.  I suspect I simply made a
> mistake previously.  Here is a quick summary of my experiment:
> * I am comparing a numeric literal in a query to an integer literal
> in a model.
> * The variables are: - Memory model versus TDB model - Comparison
> within a filter versus in the triple pattern itself - Integer versus
> decimal - Canonical versus non-canonical lexical form
> * Complete results can be seen below, but the unexpected result is
> this:  When the literal in the query is in the triple pattern and is
> type decimal, then a memory model produces a positive match, but a
> TDB model does not.
> * I am using TDB 0.8.10 (and the Jena and ARQ that come with it).
> Is this what you expect?

Yes, it is what I expect with TDB currently.

Jena in-memory does comparisons by value and keeps terms separate;
; TDB comparision in patterns are done by comparing the NodeIds.

TDB canonicalizes integers and decimals but keeps them separate, so they 
are different NodeIds.


:x :p 47 .
:x :p 47.0 .

one triple or two?

For TDB, it could keep values only, get the comparison you expected (not 
unreasonably) but to keep access efficient if would have to be by 
keeping one triple for the example.  Probbaly, I'd keep integer values 
as integers even if decimals in the data:

"47.0"^^xsd:decimal input would be "47"^^xsd:integer output.


View raw message