groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Di Tommaso <paolo.ditomm...@gmail.com>
Subject Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()
Date Tue, 09 Jun 2015 15:08:05 GMT
I'm wondering if NioGroovyMethods that implement the write methods for Path
should do the same.


Cheers,
Paolo


On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganwitt@gmail.com> wrote:

> Cool.  I'll wait for PR 36 to be merged first, because I also was thinking
> the Javadoc would be changed from
>     is "UTF-16BE" or "UTF-16LE"
> to
>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>
> -Keegan
>
>
> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glaforge@gmail.com>
> wrote:
>
>>
>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>
>>> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461>
>>> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>.
>>>
>>
>> Cool!
>>
>>
>>> How would you feel about a PR to copy the Javadoc comment mentioning the
>>> UTF-16 BOM on File.newWriter to all the other methods that use
>>> writeUTF16BomIfRequired (at least until we decide we're going to change
>>> the current behavior)?
>>>
>>
>> Right, worth it!
>>
>>
>>>
>>> -Keegan
>>>
>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glaforge@gmail.com>
>>> wrote:
>>>
>>>> Good point!
>>>>
>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>
>>>>> That's only available in Java 7.  Isn't Groovy still targeting 1.6 for
>>>>> the non-indy version?
>>>>>
>>>>> -Keegan
>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glaforge@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Well spotted!
>>>>>>
>>>>>> You could also compare with the StandardCharset, instead of going
>>>>>> through the name comparison:
>>>>>>
>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>
>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>
>>>>>>> No, it's a Groovy bug.
>>>>>>>
>>>>>>> private static void writeUTF16BomIfRequired(final String charset,
final OutputStream stream) throws IOException {
>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>     } else if ("UTF-16LE".equals(charset)) {
>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> should be
>>>>>>>
>>>>>>> private static void writeUTF16BomIfRequired(final String charset,
final OutputStream stream) throws IOException {
>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name()))
{
>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll
>>>>>>> probably want to fix that regardless of what we decide on the
>>>>>>> *withPrintWriter* question.  I'll open a Jira and a PR.
>>>>>>>
>>>>>>> -Keegan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <
>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>
>>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy),
the
>>>>>>>> BOM is automatically discarded when you use one of our reader
methods
>>>>>>>> (withReader, etc), so it's transparent whether the BOM is
here or not.
>>>>>>>>
>>>>>>>> I tend to think that having the BOM always is a good thing
(I even
>>>>>>>> thought that was mandatory), but Groovy should guess the
endianness
>>>>>>>> regardless anyway.
>>>>>>>>
>>>>>>>> Happy to hear what others think too about all this though.
>>>>>>>>
>>>>>>>> Guillaume
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>
>>>>>>>>> The code as-is today writes the BOM regardless of platform.
 I
>>>>>>>>> just tested in Linux with the same results.  I think
there are 2 parts to
>>>>>>>>> the question of "what's the correct behavior?"
>>>>>>>>>
>>>>>>>>> 1.  Should the BOM be written at all, particularly when
the
>>>>>>>>> platform is Windows?
>>>>>>>>> 2.  Should the behavior of *withPrintWriter* differ (even
if the
>>>>>>>>> difference is to be smarter) from the behavior of *new
>>>>>>>>> PrintWriter*?
>>>>>>>>>
>>>>>>>>> *Discussion*
>>>>>>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in
section 4.3 to
>>>>>>>>> assume big endian if there is no BOM.  However, in practice,
many
>>>>>>>>> applications disregard the RFC and assume little-endian
because that's what Windows
>>>>>>>>> does
>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>> Because of this, the behavior could be changed so that
when writing
>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in
my opinion, it's
>>>>>>>>> best practice to always write a BOM when working with
UTF-16, and Java
>>>>>>>>> should have done this in their implementation of their
PrintWriter.
>>>>>>>>>
>>>>>>>>> 2.  This is a tough one.  Arguably, *withPrintWriter*
is doing
>>>>>>>>> the smarter, more correct behavior, but the typical user
would assume this
>>>>>>>>> is just a shorthand convenience for newing up a PrintWriter
(I certainly
>>>>>>>>> did).  So the question is, is it better to just document
this difference in
>>>>>>>>> the GroovyDoc?  Or to change the behavior to be closer
to Java?  And if the
>>>>>>>>> latter, what breakages would that cause within Groovy
itself?  Making that
>>>>>>>>> change could break folks in production, because they
could rely on that BOM
>>>>>>>>> being there, in cases for example where the file is created
on Windows, but
>>>>>>>>> then processed on Linux or when working with a third
party library that is
>>>>>>>>> more picky about the presence of a BOM.
>>>>>>>>>
>>>>>>>>> -Keegan
>>>>>>>>>
>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <
>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Now... is it what should be done or not is the good
question to
>>>>>>>>>> ask :-)
>>>>>>>>>> Does Windows manages to open UTF-16 files without
BOMs?
>>>>>>>>>>
>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> I forgot to mention that.  Yes, I ran the test
mentioned in
>>>>>>>>>>> Windows.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge
<
>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>> I guess this is happening on Windows? (I
haven't tried here,
>>>>>>>>>>>> since I'm on OS X)
>>>>>>>>>>>> I think BOMs were mandatory in text files
on Windows.
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> I've always taken a perverse pleasure
in character encoding
>>>>>>>>>>>>> problems.  I was intrigued by this SO
question
>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It appears using withPrintWriter(charset)
produces a BOM
>>>>>>>>>>>>> whereas new PrintWriter(file, charset)
does not.  As
>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>>>>>>     String text = " "
>>>>>>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     file.withPrintWriter(charset) { it
<< text }
>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x
", it) }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     PrintWriter w = new PrintWriter(file,
charset)
>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>>>>>>     file.getBytes().each { System.out.format("%02x
", it) }} finally {
>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>
>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>
>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>
>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is this difference in behavior intentional?
 It seems kinda
>>>>>>>>>>>>> odd to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Laforge
>>>>>>>>>> Groovy Project Manager
>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>
>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Laforge
>>>>>>>> Groovy Project Manager
>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>
>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Laforge
>>>>>> Groovy Project Manager
>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>
>>>>>> Blog: http://glaforge.appspot.com/
>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Laforge
>>>> Groovy Project Manager
>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>
>>>> Blog: http://glaforge.appspot.com/
>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>
>>>
>>>
>>
>>
>> --
>> Guillaume Laforge
>> Groovy Project Manager
>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>
>> Blog: http://glaforge.appspot.com/
>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>
>
>

Mime
View raw message