groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keegan Witt <keeganw...@gmail.com>
Subject Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()
Date Tue, 09 Jun 2015 15:21:57 GMT
That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims (in
the JavaDoc) it will write the BOM if needed, but it doesn't because it
uses Java's implementation rather than with Groovy's writeUTF16BomIfRequired.
None of the methods in NioGroovyMethods use writeUTF16BomIfRequired.

Whichever we decide, we should be consistent.

-Keegan

On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <paolo.ditommaso@gmail.com
> wrote:

> I'm wondering if NioGroovyMethods that implement the write methods for
> Path should do the same.
>
>
> Cheers,
> Paolo
>
>
> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganwitt@gmail.com> wrote:
>
>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>> thinking the Javadoc would be changed from
>>     is "UTF-16BE" or "UTF-16LE"
>> to
>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>
>> -Keegan
>>
>>
>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glaforge@gmail.com>
>> wrote:
>>
>>>
>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>
>>>> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461>
>>>> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>.
>>>>
>>>
>>> Cool!
>>>
>>>
>>>> How would you feel about a PR to copy the Javadoc comment mentioning
>>>> the UTF-16 BOM on File.newWriter to all the other methods that use
>>>> writeUTF16BomIfRequired (at least until we decide we're going to
>>>> change the current behavior)?
>>>>
>>>
>>> Right, worth it!
>>>
>>>
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glaforge@gmail.com>
>>>> wrote:
>>>>
>>>>> Good point!
>>>>>
>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>
>>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6
>>>>>> for the non-indy version?
>>>>>>
>>>>>> -Keegan
>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glaforge@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Well spotted!
>>>>>>>
>>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>>> through the name comparison:
>>>>>>>
>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>
>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>
>>>>>>>> No, it's a Groovy bug.
>>>>>>>>
>>>>>>>> private static void writeUTF16BomIfRequired(final String
charset, final OutputStream stream) throws IOException {
>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>> should be
>>>>>>>>
>>>>>>>> private static void writeUTF16BomIfRequired(final String
charset, final OutputStream stream) throws IOException {
>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name()))
{
>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name()))
{
>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>>> probably want to fix that regardless of what we decide on
the
>>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> From Groovy's point of view (ie. when you're coding in
Groovy),
>>>>>>>>> the BOM is automatically discarded when you use one of
our reader methods
>>>>>>>>> (withReader, etc), so it's transparent whether the BOM
is here or not.
>>>>>>>>>
>>>>>>>>> I tend to think that having the BOM always is a good
thing (I even
>>>>>>>>> thought that was mandatory), but Groovy should guess
the endianness
>>>>>>>>> regardless anyway.
>>>>>>>>>
>>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>>
>>>>>>>>> Guillaume
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>
>>>>>>>>>> The code as-is today writes the BOM regardless of
platform.  I
>>>>>>>>>> just tested in Linux with the same results.  I think
there are 2 parts to
>>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>>
>>>>>>>>>> 1.  Should the BOM be written at all, particularly
when the
>>>>>>>>>> platform is Windows?
>>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ
(even if the
>>>>>>>>>> difference is to be smarter) from the behavior of
*new
>>>>>>>>>> PrintWriter*?
>>>>>>>>>>
>>>>>>>>>> *Discussion*
>>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states
in section 4.3 to
>>>>>>>>>> assume big endian if there is no BOM.  However, in
practice, many
>>>>>>>>>> applications disregard the RFC and assume little-endian
because that's what Windows
>>>>>>>>>> does
>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>> Because of this, the behavior could be changed so
that when writing
>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But
in my opinion, it's
>>>>>>>>>> best practice to always write a BOM when working
with UTF-16, and Java
>>>>>>>>>> should have done this in their implementation of
their PrintWriter.
>>>>>>>>>>
>>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter*
is doing
>>>>>>>>>> the smarter, more correct behavior, but the typical
user would assume this
>>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter
(I certainly
>>>>>>>>>> did).  So the question is, is it better to just document
this difference in
>>>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer
to Java?  And if the
>>>>>>>>>> latter, what breakages would that cause within Groovy
itself?  Making that
>>>>>>>>>> change could break folks in production, because they
could rely on that BOM
>>>>>>>>>> being there, in cases for example where the file
is created on Windows, but
>>>>>>>>>> then processed on Linux or when working with a third
party library that is
>>>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge
<
>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Now... is it what should be done or not is the
good question to
>>>>>>>>>>> ask :-)
>>>>>>>>>>> Does Windows manages to open UTF-16 files without
BOMs?
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> I forgot to mention that.  Yes, I ran the
test mentioned in
>>>>>>>>>>>> Windows.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume
Laforge <
>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>> I guess this is happening on Windows?
(I haven't tried here,
>>>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>>>> I think BOMs were mandatory in text files
on Windows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt
<keeganwitt@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've always taken a perverse pleasure
in character encoding
>>>>>>>>>>>>>> problems.  I was intrigued by this
SO question
>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It appears using withPrintWriter(charset)
produces a BOM
>>>>>>>>>>>>>> whereas new PrintWriter(file, charset)
does not.  As
>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> File file = new File("tmp.txt")try
{
>>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     file.withPrintWriter(charset)
{ it << text }
>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x
", it) }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file,
charset)
>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x
", it) }} finally {
>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is this difference in behavior intentional?
 It seems kinda
>>>>>>>>>>>>>> odd to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
<http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>

Mime
View raw message