groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keegan Witt <keeganw...@gmail.com>
Subject Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()
Date Tue, 09 Jun 2015 19:47:15 GMT
I get that -- and I wish JDK did the same.  But what bothers me most about
the current state is that sometimes it's transparent, sometimes it's not --
depending on how it was invoked.  And while we could fix the new instance
usage too with metaClass, that could lead to weird inconsistencies when
Groovy is invoked from Java.

I really think most users would not expect these two usages to behave
differently.  I think most would expect the difference to be stylistic
only.  So as much as it pains me to say this, I think it's better not to
violate the principle of least surprise, and remain consistent across all
styles of invocation with Java's poor life choices.

But maybe the friendlier APIs can be moved into new methods, such as new
BomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we did
that, I guess it'd be consistent to do the same for the readers as well.

-Keegan

On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <glaforge@gmail.com>
wrote:

>
> 2015-06-09 18:57 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>
>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to
>> correct the JavaDoc I mentioned (as well as to document the existing
>> behavior for the non-NIO methods).
>>
>> Java doesn't eat the BOM, but this is a problem Java folks are used to
>> dealing with, and why things like Apache Common-IO's BOMInputStream
>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>> exist.
>>
>
> That's also why I made Groovy eat the BOM too, so that it's transparent to
> our users :-)
> But that was a long time ago since I worked on those parts of the
> codebase, and it's been refactored quite a bit (by Jim for example).
>
>
>>
>> -Keegan
>>
>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <glaforge@gmail.com>
>> wrote:
>>
>>> So now, how to decide what's best? :-)
>>>
>>> Is a Java reader happy with the BOM? and eats it transparently? (I think
>>> in the past that wasn't the case but I may be wrong)
>>>
>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>
>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims
>>>> (in the JavaDoc) it will write the BOM if needed, but it doesn't because
it
>>>> uses Java's implementation rather than with Groovy's
>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods use
>>>> writeUTF16BomIfRequired.
>>>>
>>>> Whichever we decide, we should be consistent.
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>> paolo.ditommaso@gmail.com> wrote:
>>>>
>>>>> I'm wondering if NioGroovyMethods that implement the write methods for
>>>>> Path should do the same.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Paolo
>>>>>
>>>>>
>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganwitt@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I also was
>>>>>> thinking the Javadoc would be changed from
>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>> to
>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glaforge@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>
>>>>>>>> Created GROOVY-7461
>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461>
and PR 36
>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>
>>>>>>>
>>>>>>> Cool!
>>>>>>>
>>>>>>>
>>>>>>>> How would you feel about a PR to copy the Javadoc comment
>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other
>>>>>>>> methods that use writeUTF16BomIfRequired (at least until
we decide
>>>>>>>> we're going to change the current behavior)?
>>>>>>>>
>>>>>>>
>>>>>>> Right, worth it!
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Good point!
>>>>>>>>>
>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>
>>>>>>>>>> That's only available in Java 7.  Isn't Groovy still
targeting
>>>>>>>>>> 1.6 for the non-indy version?
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glaforge@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Well spotted!
>>>>>>>>>>>
>>>>>>>>>>> You could also compare with the StandardCharset,
instead of
>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>
>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>
>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final
String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>     if ("UTF-16BE".equals(charset)) {
>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset))
{
>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>     }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> should be
>>>>>>>>>>>>
>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final
String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name()))
{
>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name()))
{
>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>     }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.
 We'll
>>>>>>>>>>>> probably want to fix that regardless of what
we decide on the
>>>>>>>>>>>> *withPrintWriter* question.  I'll open a
Jira and a PR.
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume
Laforge <
>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> From Groovy's point of view (ie. when
you're coding in
>>>>>>>>>>>>> Groovy), the BOM is automatically discarded
when you use one of our reader
>>>>>>>>>>>>> methods (withReader, etc), so it's transparent
whether the BOM is here or
>>>>>>>>>>>>> not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tend to think that having the BOM always
is a good thing (I
>>>>>>>>>>>>> even thought that was mandatory), but
Groovy should guess the endianness
>>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Happy to hear what others think too about
all this though.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt
<keeganwitt@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The code as-is today writes the BOM
regardless of platform.
>>>>>>>>>>>>>> I just tested in Linux with the same
results.  I think there are 2 parts to
>>>>>>>>>>>>>> the question of "what's the correct
behavior?"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.  Should the BOM be written at
all, particularly when the
>>>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>>>> 2.  Should the behavior of *withPrintWriter*
differ (even if
>>>>>>>>>>>>>> the difference is to be smarter)
from the behavior of *new
>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>> 1.  Strictly speaking, yes.  Because
RFC 2781
>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781>
states in section 4.3
>>>>>>>>>>>>>> to assume big endian if there is
no BOM.  However, in practice, many
>>>>>>>>>>>>>> applications disregard the RFC and
assume little-endian because that's what Windows
>>>>>>>>>>>>>> does
>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>> Because of this, the behavior could
be changed so that when writing
>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write
the BOM.  But in my opinion, it's
>>>>>>>>>>>>>> best practice to always write a BOM
when working with UTF-16, and Java
>>>>>>>>>>>>>> should have done this in their implementation
of their PrintWriter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2.  This is a tough one.  Arguably,
*withPrintWriter* is
>>>>>>>>>>>>>> doing the smarter, more correct behavior,
but the typical user would assume
>>>>>>>>>>>>>> this is just a shorthand convenience
for newing up a PrintWriter (I
>>>>>>>>>>>>>> certainly did).  So the question
is, is it better to just document this
>>>>>>>>>>>>>> difference in the GroovyDoc?  Or
to change the behavior to be closer to
>>>>>>>>>>>>>> Java?  And if the latter, what breakages
would that cause within Groovy
>>>>>>>>>>>>>> itself?  Making that change could
break folks in production, because they
>>>>>>>>>>>>>> could rely on that BOM being there,
in cases for example where the file is
>>>>>>>>>>>>>> created on Windows, but then processed
on Linux or when working with a
>>>>>>>>>>>>>> third party library that is more
picky about the presence of a BOM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume
Laforge <
>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now... is it what should be done
or not is the good question
>>>>>>>>>>>>>>> to ask :-)
>>>>>>>>>>>>>>> Does Windows manages to open
UTF-16 files without BOMs?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan
Witt <keeganwitt@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I forgot to mention that.
 Yes, I ran the test mentioned in
>>>>>>>>>>>>>>>> Windows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54
PM, Guillaume Laforge <
>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That's a good question.
>>>>>>>>>>>>>>>>> I guess this is happening
on Windows? (I haven't tried
>>>>>>>>>>>>>>>>> here, since I'm on OS
X)
>>>>>>>>>>>>>>>>> I think BOMs were mandatory
in text files on Windows.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00
Keegan Witt <
>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've always taken
a perverse pleasure in character
>>>>>>>>>>>>>>>>>> encoding problems.
 I was intrigued by this SO question
>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java
vs Groovy.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It appears using
withPrintWriter(charset) produces a BOM
>>>>>>>>>>>>>>>>>> whereas new PrintWriter(file,
charset) does not.  As
>>>>>>>>>>>>>>>>>> demonstrated here:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try
{
>>>>>>>>>>>>>>>>>>     String text =
" "
>>>>>>>>>>>>>>>>>>     String charset
= "UTF-16LE"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset)
{ it << text }
>>>>>>>>>>>>>>>>>>     println "withPrintWriter"
>>>>>>>>>>>>>>>>>>     file.getBytes().each
{ System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     PrintWriter w
= new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>     println "\n\nnew
PrintWriter"
>>>>>>>>>>>>>>>>>>     file.getBytes().each
{ System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>> ff fe 20 00
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is this difference
in behavior intentional?  It seems
>>>>>>>>>>>>>>>>>> kinda odd to me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>> Product Ninja & Advocate
at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate
at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
<http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Mime
View raw message