groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keegan Witt <keeganw...@gmail.com>
Subject Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()
Date Mon, 08 Jun 2015 21:20:19 GMT
The code as-is today writes the BOM regardless of platform.  I just tested
in Linux with the same results.  I think there are 2 parts to the question
of "what's the correct behavior?"

1.  Should the BOM be written at all, particularly when the platform is
Windows?
2.  Should the behavior of *withPrintWriter* differ (even if the difference
is to be smarter) from the behavior of *new PrintWriter*?

*Discussion*
1.  Strictly speaking, yes.  Because RFC 2781
<http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big
endian if there is no BOM.  However, in practice, many applications
disregard the RFC and assume little-endian because that's what Windows does
<https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
Because of this, the behavior could be changed so that when writing
UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
best practice to always write a BOM when working with UTF-16, and Java
should have done this in their implementation of their PrintWriter.

2.  This is a tough one.  Arguably, *withPrintWriter* is doing the smarter,
more correct behavior, but the typical user would assume this is just a
shorthand convenience for newing up a PrintWriter (I certainly did).  So
the question is, is it better to just document this difference in the
GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
latter, what breakages would that cause within Groovy itself?  Making that
change could break folks in production, because they could rely on that BOM
being there, in cases for example where the file is created on Windows, but
then processed on Linux or when working with a third party library that is
more picky about the presence of a BOM.

-Keegan

On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <glaforge@gmail.com>
wrote:

> Now... is it what should be done or not is the good question to ask :-)
> Does Windows manages to open UTF-16 files without BOMs?
>
> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>
>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>
>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <glaforge@gmail.com>
>> wrote:
>>
>>> That's a good question.
>>> I guess this is happening on Windows? (I haven't tried here, since I'm
>>> on OS X)
>>> I think BOMs were mandatory in text files on Windows.
>>>
>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>
>>>> I've always taken a perverse pleasure in character encoding problems.
>>>> I was intrigued by this SO question
>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
>>>> UTF 16 BOMs in Java vs Groovy.
>>>>
>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>
>>>> File file = new File("tmp.txt")try {
>>>>     String text = " "
>>>>     String charset = "UTF-16LE"
>>>>
>>>>     file.withPrintWriter(charset) { it << text }
>>>>     println "withPrintWriter"
>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>
>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>     w.print(text)
>>>>     w.close()
>>>>     println "\n\nnew PrintWriter"
>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally {
>>>>     file.delete()}
>>>>
>>>> Outputs
>>>>
>>>> withPrintWriter
>>>> ff fe 20 00
>>>>
>>>> new PrintWriter
>>>> 20 00
>>>>
>>>>
>>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>>
>>>> -Keegan
>>>>
>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Mime
View raw message