commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case
Date Mon, 16 Jan 2017 22:24:26 GMT

    [ https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824663#comment-15824663
] 

ASF GitHub Bot commented on COMMONSRDF-51:
------------------------------------------

Github user ansell commented on the issue:

    https://github.com/apache/commons-rdf/pull/30
  
    In the RDF4J/Sesame case, we have had some users request, and some other users complain
about , both lowercasing, which was used in the past, and canonicalisation, so RDF4J will
default to leaving case alone, but any user is free to switch on the canonicalisation. Currently
there isn't a lowercase-all-tags option, but that may also appear in the future.
    
    For reference, the language tag canonicalisation procedure that RDF4J optionally uses,
which relies on the JDK's copy of the IANA Language Subtag Registry, is:
    
    ```
    new Locale.Builder().setLanguageTag(tag).build().toLanguageTag()
    ```
    
    There are other possible methods, but the method above is the only one that I could find
which throws an error if the original tag is illformed.


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>
>                 Key: COMMONSRDF-51
>                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
>             Project: Apache Commons RDF
>          Issue Type: Bug
>          Components: api
>    Affects Versions: 0.3.0
>            Reporter: Peter Ansell
>            Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language tags is lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal],
which does not conflict with the case-insensitive specification in BCP47. The Literal.equals
and Literal.hashCode API contracts should specify that language tags must be compared using
lowercase, even if they are otherwise stored and returned as upper-case by getLanguageTag.
The API currently has incorrect language by saying "character-by-character" for language tag
comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known example where
lowercase and uppercase do not roundtrip as expected for US-ASCII characters is Turkish [1]),
so I would recommend actually stating that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message