incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Seaborne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-99) NQuadsWriter should force ASCII in OutputStream constructor
Date Tue, 22 May 2012 18:39:42 GMT

    [ https://issues.apache.org/jira/browse/ANY23-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281138#comment-13281138
] 

Andy Seaborne commented on ANY23-99:
------------------------------------

Is there a specific example of this happening?

The encoding rules for NQuads are to use \u so something has to encode to ASCII and it is
not enough to rely  the writer doing chars to bytes.

I think this is handled via the calls:

Literals:

org.openrdf.rio.ntriples.NTriplesUtil.toNTriplesString

URIs:

org.openrdf.rio.ntriples.NTriplesUtil.escapeString

Comments 

handleComment does not encode - this is (arguably) not quite right.

Also:

The charset requirements may well change.  The soon-to-be-published working draft of the formal
spec for N-triples defines it to be UTF-8 when used with application/n-triples.  The old rules
for text/plain still apply (US-ASCII).   I would expect N-Quads to follow N-triples.  This
is all in the future.

                
> NQuadsWriter should force ASCII in OutputStream constructor
> -----------------------------------------------------------
>
>                 Key: ANY23-99
>                 URL: https://issues.apache.org/jira/browse/ANY23-99
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.0
>            Reporter: Peter Ansell
>
> The NQuads specification states that all NQuads documents must be ASCII encoded. [1]
The current NQuadsWriter(OutputStream) constructor does not enforce this when creating the
OutputStreamWriter to wrap up the given outputstream. If it is not enforced, then the users
locale will be used to create the OutputStreamWriter, which may not enforce US-ASCII.
> Patch is to replace the constructor with:
>         this( new OutputStreamWriter(os, Charset.forName("US-ASCII")) );
> [1] http://sw.deri.org/2008/07/n-quads/#mediatype

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message