db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean T. Anderson" <...@bristowhill.com>
Subject Re: Writing platform-specific line-endings to disk...
Date Fri, 17 Nov 2006 23:48:15 GMT
this sounds like it might be the same platform-difference problem
Forrest runs into and that affects the Derby web site:

http://db.apache.org/derby/papers/derby_web.html#odd_diffs

FOR-492 references a workaround, but I haven't looked at it, don't know
if it could apply to Derby.

 -jean

Army wrote:
> As part of my work for DERBY-1758 I'm looking at the XML binding test
> (lang/xmlBinding.java in the old harness, lang/XMLBindingTest.java in
> JUnit) and I noticed that the test, which counts characters as a simple
> sanity check for insertion of docs larger than 32k, returns different
> results on Linux vs Windows.  (Actually, Bryan Pendleton was the first
> one to notice this a while back when he was reviewing DERBY-688 changes).
> 
> Long story short, Xalan serialization (which is what Derby uses to
> serialize XML documents) inserts platform-specific line-endings (based
> on the "line.separator" System property) into XML documents for every
> newline.  This appears to be technically valid, so it is not a bug per
> se [1].  However, from a Derby perspective this means that someone who
> inserts the exact same XML document into an XML column on Windows vs on
> Linux will actually be inserting more characters in the former case than
> in the latter (because the Windows line separator is two characters). 
> Or put differently, when inserting an XML document on Windows an extra
> character is written to disk for every line in the XML document.  This
> does *not* happen with other character types (ex. CLOB).
> 
> My question, then, is this: Is it considered a "bug" in Derby if
> insertion of the same XML value by the user can lead to different data
> (namely, line ending characters) being written to disk for different
> platforms?
> 
> There appear to be two obvious ways to get around this problem: 1) add
> logic in Derby engine to take the result of Xalan serialization and
> replace platform-specific line-endings with "\n", or 2) change the XML
> binding test to always count line-endings as a single "character" for
> the sake of asserting character counts.
> 
> I'm leaning toward option 1, but am not particularly driven one way or
> the other.  If the answer to my above question is "Yes, it's a bug",
> then option 1 is clearly the only option; otherwise option 2 makes the
> test pass and is easy to implement.  It does a feel a tad like cheating,
> though...
> 
> Comments/feedback are appreciated, if anyone has any.
> 
> Thanks,
> Army
> 
> ----
> 
> [1]
> 
> I searched Jira for this and found a couple of relevant Xalan issues,
> especially XALANJ-2093 and XALANJ-1701.  There is apparently a new
> property introduced in Xalan 2.7 to allow the user to indicate what
> should happen with newlines, but that property is non-standard and would
> require Derby to use Xalan 2.7 in order to build.
> 
> Based on comments in the aforementioned XALANJ issues it looks like it
> is technically valid for Xalan to convert the newlines to
> platform-specific endings.  This seems to agree with the following quote
> from the w3c page on serialization:
> 
> http://www.w3.org/TR/xslt-xquery-serialization/#serdm:
> 
> "When outputting a newline character in the instance of the data model,
> the serializer is free to represent it using any character sequence that
> will be normalized to a newline character by an XML parser, unless a
> specific mapping for the newline character is provided in a character
> map (see 9 Character Maps)."
> 
> I don't know what Xalan serialization does with character maps, but
> there is nothing explicit in Derby to specify use of such maps, so my
> (admittedly lacking) understanding is that it's okay for Xalan to return
> platform-specific line-endings when serializing.
> 


Mime
View raw message