commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case
Date Mon, 16 Jan 2017 22:15:26 GMT


ASF GitHub Bot commented on COMMONSRDF-51:

Github user afs commented on the issue:
    @ansell mentions one of the reasons the wording for RDF 1.1is not so direct - RDF 1.0
did not sanction the common normalization defined in BCP47 canonicalization, although that
actually requires consulting the registry as well.
    Jena is lax by default, and retains the form as originally written. In practice, datasets
seem to be internally consistent, all lower case or all syntax-canonical. 
    Variations of case are different nodes in the general case but are `Node.sameValue` (compare)
and cause matching in graph.find. Some storage layers may differ and canonicalize the form,
in order to index.

> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>                 Key: COMMONSRDF-51
>                 URL:
>             Project: Apache Commons RDF
>          Issue Type: Bug
>          Components: api
>    Affects Versions: 0.3.0
>            Reporter: Peter Ansell
>            Assignee: Stian Soiland-Reyes
> The [RDF-1.1 specification states that the [value space of Literal language tags is lowercase|],
which does not conflict with the case-insensitive specification in BCP47. The Literal.equals
and Literal.hashCode API contracts should specify that language tags must be compared using
lowercase, even if they are otherwise stored and returned as upper-case by getLanguageTag.
The API currently has incorrect language by saying "character-by-character" for language tag
comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known example where
lowercase and uppercase do not roundtrip as expected for US-ASCII characters is Turkish [1]),
so I would recommend actually stating that .toLowerCase(Locale.ENGLISH) is used.

This message was sent by Atlassian JIRA

View raw message