jena-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: How is UTF-8 handled in TDB
Date Thu, 23 Feb 2012 18:07:22 GMT
On 23/02/12 17:05, Tim Harsch wrote:
> So I knew that TDB used an id in place of a string, except in the
> case of inlined values.  Are you saying that non-inlined values use
> an MD5 digest?  I did not know that.

To go from string to id, yes.  It's needed to look up query constants.

> So, if no normalization is done on literals how does Fuseki/TDB pass
> the normalization tests of SPARQL DAWG?  My understanding of this is
> still limited but I'm assuming that normalization tests won't pass
> for two non-normalized literals (that are non-equal without
> normalization; but would be after) unless both literals in a
> comparison were first normalized (either as pre-step or at string
> table load time or at query time).
> Thanks, Tim

Which tests exactly?

normalization-01 is explicitly showing that normalized and 
non-normalized don't match.  The results do not include Alice; there is 
one match for Eve, not two.

normalization 02,03

If you follow to the email, it's about IRI normalization - that's 
different to unicode normalization.

As q query engine isn't an ebd system (data goes in and out) 
normalization of URIs isn't required and some argue should not be done.


View raw message