db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel John Debrunner (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2106) Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.
Date Wed, 22 Nov 2006 00:29:02 GMT
    [ http://issues.apache.org/jira/browse/DERBY-2106?page=comments#action_12451828 ] 
            
Daniel John Debrunner commented on DERBY-2106:
----------------------------------------------

I'm trying to clear my thoughts on this issue. Forgetting about how XML values are stored
within Derby, or how an application uses XML values I'm concentrating on the pure SQL operator
XMLSERIALIZE. Please see this as a dump of thoughts.

The XMLSERIALIZE() operator serializes an XML value to a character string type within Derby.
Character types within Derby are always a sequence of UniCode characters.

The behaviour of the XMLSERIALIZE operator is defined by the SQL standard, which refers off
to the page Army referenced:

http://www.w3.org/TR/xslt-xquery-serialization/

Section 5.1.2 of that link is the section with the comment about new line characters, and
it refers to an encoding.

>From the SQL/XML spec (6.7 GR 2d) the encoding is the character set of the target datatype,
which is UniCode characters for Derby.

So this expression

XMLSERIALIZE( XMLPARSE (DOCUMENT '<copy>&#169; ASF 2006</copy>' PRESERVE WHITESPACE)AS
VARCHAR(100));

returns a VARCHAR that includes the Unicode character for copyright symbol (0x00A9) instead
of the six characters '&#169;'.

And indeed, Derby does that. :-)

A new-line in UniCode characters is represented by the '\n' character. I would assert this
is what any new-line must map to when using XMLSERIALIZE().
The behaviour of XMLSERIALIZE should not be affected by what platform Derby is running on.
I think I still though need to see what the SQL/XML and/or XML rules are on input of a XML
document with newlines.

> Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines
when serializing.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2106
>                 URL: http://issues.apache.org/jira/browse/DERBY-2106
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.2.1.8
>            Reporter: A B
>            Priority: Minor
>
> Derby uses Apache Xalan to serialize XML data values.  As part of the serialization process
Xalan converts the newline character ("\n") to a platform-specific line ending.  This conversion
of line endings is allowed by XML serialization rules and therefore is not a bug in Xalan--see
XALANJ-1137 for some discussion along those lines.  That said, though, this particular behavior
means that an application which uses Derby to serialize XML values can end up with different
characters on different platforms.  And further, since Derby currently writes serialized XML
to disk, this means that insertion of an XML value on one platform (such as Windows) can lead
to different line-ending characters on disk than insertion of that exact same XML value on
another platform (such as Linux).
> Discussion on the derby-dev list seems to indicate (based on lack of comments to the
contrary) that this behavior in Derby is not a "bug" per se, but that it might be nice if
Derby could somehow account for Xalan's treatment of newlines to provide consistent XML serialization
results across platforms.  The relevant thread is here:
>   http://thread.gmane.org/gmane.comp.apache.db.derby.devel/33170/focus=33170 
> As indicated in that thread, one simple (but not fully tested) approach is to make a
change in the "serializeToString()" method of SqlXmlUtil.java to do an explicit replacement
of platform-specific line-endings with a simple newline.  Something like:
> +        String eol = PropertyUtil.getSystemProperty("line.separator");
> +        if (eol != null)
> +            return sWriter.toString().replaceAll(eol, "\n");
>         return sWriter.toString();
> This small change seems to provide consistent results across all platforms, and appears
to work correctly even if line-endings are hard-coded in the XML file (ex. if the literal
"\r\n" occurs in the XML file, the above code will *not* replace it, which is good).  However,
internal modification of user-supplied data is generally a risky proposal, so more testing
would be needed for this particular approach.
> Also, any changes to Derby serialization as a part of this issue would need to consider
backward-compatibility issues--namely, how would the changes affect XML files that have already
been inserted into the database (and therefore that already have platform-specific endings)?
 Ideally treatment of existing and new XML data would be consistent.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message