groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keegan Witt <keeganw...@gmail.com>
Subject Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()
Date Mon, 08 Jun 2015 21:41:32 GMT
Another point of interest is that the current code doesn't respect
aliases.  For example, the charset string "UTF_16LE" will not write the
BOM, despite being an alias for "UTF-16LE"

-Keegan
On Jun 8, 2015 5:20 PM, "Keegan Witt" <keeganwitt@gmail.com> wrote:

> The code as-is today writes the BOM regardless of platform.  I just tested
> in Linux with the same results.  I think there are 2 parts to the question
> of "what's the correct behavior?"
>
> 1.  Should the BOM be written at all, particularly when the platform is
> Windows?
> 2.  Should the behavior of *withPrintWriter* differ (even if the
> difference is to be smarter) from the behavior of *new PrintWriter*?
>
> *Discussion*
> 1.  Strictly speaking, yes.  Because RFC 2781
> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big
> endian if there is no BOM.  However, in practice, many applications
> disregard the RFC and assume little-endian because that's what Windows
> does
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
> Because of this, the behavior could be changed so that when writing
> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
> best practice to always write a BOM when working with UTF-16, and Java
> should have done this in their implementation of their PrintWriter.
>
> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
> smarter, more correct behavior, but the typical user would assume this is
> just a shorthand convenience for newing up a PrintWriter (I certainly
> did).  So the question is, is it better to just document this difference in
> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
> latter, what breakages would that cause within Groovy itself?  Making that
> change could break folks in production, because they could rely on that BOM
> being there, in cases for example where the file is created on Windows, but
> then processed on Linux or when working with a third party library that is
> more picky about the presence of a BOM.
>
> -Keegan
>
> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <glaforge@gmail.com>
> wrote:
>
>> Now... is it what should be done or not is the good question to ask :-)
>> Does Windows manages to open UTF-16 files without BOMs?
>>
>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>
>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>
>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <glaforge@gmail.com>
>>> wrote:
>>>
>>>> That's a good question.
>>>> I guess this is happening on Windows? (I haven't tried here, since I'm
>>>> on OS X)
>>>> I think BOMs were mandatory in text files on Windows.
>>>>
>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>
>>>>> I've always taken a perverse pleasure in character encoding problems.
>>>>> I was intrigued by this SO question
>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>
>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new
>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>
>>>>> File file = new File("tmp.txt")try {
>>>>>     String text = " "
>>>>>     String charset = "UTF-16LE"
>>>>>
>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>     println "withPrintWriter"
>>>>>     file.getBytes().each { System.out.format("%02x ", it) }
>>>>>
>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>     w.print(text)
>>>>>     w.close()
>>>>>     println "\n\nnew PrintWriter"
>>>>>     file.getBytes().each { System.out.format("%02x ", it) }} finally
{
>>>>>     file.delete()}
>>>>>
>>>>> Outputs
>>>>>
>>>>> withPrintWriter
>>>>> ff fe 20 00
>>>>>
>>>>> new PrintWriter
>>>>> 20 00
>>>>>
>>>>>
>>>>> Is this difference in behavior intentional?  It seems kinda odd to me.
>>>>>
>>>>> -Keegan
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>

Mime
View raw message