db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: Writing platform-specific line-endings to disk...
Date Mon, 20 Nov 2006 20:46:43 GMT
Daniel John Debrunner wrote:
> I was thinking more generally in that an XML value may be generated and 
> thus never have been stored to disk. How it is stored on disk and how 
> the XML value is serialized using XMLSERIALIZE() are different 
> operations, it's just an implementation detail of derby that they are 
> the same in some instances.

Okay, that makes sense.  Sorry for not grasping this earlier.

> Would all these operations return the same exact characters to an 
> application if they represent the same logical value?
> 
> XMLSERIALIZE(colvalue originally on linux)
> XMLSERIALIZE(colvalue originally on windows)
> XMLSERIALIZE(generated XML value from other XML operators)

I'm assuming the following definitions for this question:

   - let "colvalue" represent the logical value
   - let "colvalue originally on linux" be the result of inserting
     <colvalue> on a Linux machine
   - let "colvalue originally on windows" be the result of inserting
     <colvalue> on a Windows machine
   - let "n" be the number of characters (including line breaks) in
     <colvalue>.
   - let <nl> be the number of line breaks in <colvalue>

If this is correct, then the answer to the question is No, the above three 
operations would not return the same exact characters.  The result of the first 
operation will have (n) characters in it.  The result of the second operation 
will have one more character ("\r") in it for every line break in "colvalue"; 
i.e. it will have (n + nl) characters in it. And the result of the third 
operation will have (n + nl) characters if executed on Windows, but only (n) 
characters if executed on Linux.

Note that once inserted, serialization of a specific row will return the same 
characters regardless of whether the XMLSERIALIZE is executed on Windows or 
Linux.  Or put another way, the result of the first operation will always return 
(n) characters, regardless of platform.  Similarly, the result of the second 
operation will always return (n + nl) characters.

> Would it surprise an application to receive different character values 
> for those expressions?

Good question.  I did some searching around on the Xalan/Xerces Jira issues and 
the general notion seems to be that XML "output" (which I presume includes the 
result of XML serialization) can convert the newline character to the 
platform-specific newline.  See esp. Joe Kesselman's comments on XALANJ-1137. 
This leads me to believe that there is truth to what Bryan Pendleton said in his 
reply to the question, namely:

  - carefully written XML applications should not be affected by this

If the expectation (as apparently backed by the XML spec) is that "output" can 
have platform-specific newlines, then it seems like an application written to 
process XML data should not be surprised by this behavior.  And that in a way 
leads to the next question:

> If they are different, does it matter since they are all valid 
> serializations under SQL/XML?

Presumably no, it does not (or at least, should not) matter.  But having said 
that, I cannot help but nod in agreement when I read the following:

> My gut feeling is that different character values would be confusing to 
> an application, but it probably depends what the application is doing 
> with them. Looking at them in notepad would be confusing. :-)

Given that the relevant specs seem to indicate that it is valid to return 
platform-specific endings and it is *also* valid to just return "\n", and given 
that the latter option strikes me as potentially less confusing to the app, I 
tend to the lean toward the less confusing option.  Of course, a lot of that has 
to do with the fact that the latter option is pretty easily implemented in the 
code.  I made the following addition to the end of the "serializeToString()" 
method in SqlXmlUtil.java and was able to get consistent results (i.e. exactly 
the same characters) across platforms:

+        String eol = PropertyUtil.getSystemProperty("line.separator");
+        if (eol != null)
+            return sWriter.toString().replaceAll(eol, "\n");
          return sWriter.toString();

Downside is a potential performance hit for large XML docs, which may not be 
worth it.  Note, though, that the implementation as a whole is not very ideal 
for large XML documents because it (already) materializes the entire document 
into memory.  This continues to be a fish for any idle cooks to fry...

> Thinking a little more, having XMLSERIALIZE() (within an given runtime) 
> being non-deterministic seems wrong.

When you write "within a given runtime", what is the definition of "runtime"? 
Is that a specific JVM instance on a specific machine, or is it "Derby" on a 
more general level?  Or something else entirely?  Is the behavior that I 
described above (i.e. different characters depending on which platform 
originally inserted <colvalue>) considered non-deterministic?

I find myself agreeing with both Dan and Bryan on this, and for that reason I 
tend to believe the following:

(to quote Bryan):

   - it's not a bug in Derby that the serialization can differ in
     details like this
   - carefully written XML applications should not be affected by this
   - it is reasonable to adjust the test to avoid hitting this problem.

(and as an additional thought):

   - Given that there is at least one potentially simple "enhancement" to
     Derby that could resolve the issue within the engine instead of the
     within the test, it is *also* reasonable--and perhaps preferable--to
     make that change in the engine so that we can (hopefully) reduce the
     likelihood of confusing applications that use XML data in Derby.
     This would also ensure deterministic (so far as I understand it)
     behavior of XMLSERIALIZE across platforms.

Any additional thoughts/suggestions/corrections?

Thanks to Dan, Jean, and Bryan for taking the time to reply thus far...

Army


Mime
View raw message