groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keegan Witt <keeganw...@gmail.com>
Subject Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()
Date Tue, 09 Jun 2015 20:04:29 GMT
I'd be OK with that.  I think having false by default is the *Right Thing™*,
but true has a certain allure since it'd reduce the risk of breaking
existing code (hard to guess how likely breakage is).  Tough choice.  Even
if we defaulted to true, it's an improvement over current state since it
gives users the flexibility, and calling it out as a parameter might elicit
more thought and attention than just a JavaDoc comment.

On Tue, Jun 9, 2015 at 3:50 PM, Guillaume Laforge <glaforge@gmail.com>
wrote:

> So let's say, perhaps, we don't generate a BOM, unless asked
> specifically... but not with new methods, but with new parameters to such
> methods. In addition to specifying a charset, we could also pass a boolean
> saying we want a BOM to be generated (false by default, needs to be
> specified as true if BOM wanted) ?
>
> 2015-06-09 21:47 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>
>> I get that -- and I wish JDK did the same.  But what bothers me most
>> about the current state is that sometimes it's transparent, sometimes it's
>> not -- depending on how it was invoked.  And while we could fix the new
>> instance usage too with metaClass, that could lead to weird inconsistencies
>> when Groovy is invoked from Java.
>>
>> I really think most users would not expect these two usages to behave
>> differently.  I think most would expect the difference to be stylistic
>> only.  So as much as it pains me to say this, I think it's better not to
>> violate the principle of least surprise, and remain consistent across all
>> styles of invocation with Java's poor life choices.
>>
>> But maybe the friendlier APIs can be moved into new methods, such as new
>> BomAwareWriter() / WithBomAwareWriter{}  What do you think?  If we did
>> that, I guess it'd be consistent to do the same for the readers as well.
>>
>> -Keegan
>>
>> On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <glaforge@gmail.com>
>> wrote:
>>
>>>
>>> 2015-06-09 18:57 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>
>>>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37>
>>>> to correct the JavaDoc I mentioned (as well as to document the existing
>>>> behavior for the non-NIO methods).
>>>>
>>>> Java doesn't eat the BOM, but this is a problem Java folks are used to
>>>> dealing with, and why things like Apache Common-IO's BOMInputStream
>>>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html>
>>>> exist.
>>>>
>>>
>>> That's also why I made Groovy eat the BOM too, so that it's transparent
>>> to our users :-)
>>> But that was a long time ago since I worked on those parts of the
>>> codebase, and it's been refactored quite a bit (by Jim for example).
>>>
>>>
>>>>
>>>> -Keegan
>>>>
>>>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <glaforge@gmail.com>
>>>> wrote:
>>>>
>>>>> So now, how to decide what's best? :-)
>>>>>
>>>>> Is a Java reader happy with the BOM? and eats it transparently? (I
>>>>> think in the past that wasn't the case but I may be wrong)
>>>>>
>>>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>
>>>>>> That's an excellent point, Paolo.  NioGroovyMethods.newWriter claims
>>>>>> (in the JavaDoc) it will write the BOM if needed, but it doesn't
because it
>>>>>> uses Java's implementation rather than with Groovy's
>>>>>> writeUTF16BomIfRequired.  None of the methods in NioGroovyMethods
>>>>>>  use writeUTF16BomIfRequired.
>>>>>>
>>>>>> Whichever we decide, we should be consistent.
>>>>>>
>>>>>> -Keegan
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso <
>>>>>> paolo.ditommaso@gmail.com> wrote:
>>>>>>
>>>>>>> I'm wondering if NioGroovyMethods that implement the write methods
>>>>>>> for Path should do the same.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Paolo
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganwitt@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cool.  I'll wait for PR 36 to be merged first, because I
also was
>>>>>>>> thinking the Javadoc would be changed from
>>>>>>>>     is "UTF-16BE" or "UTF-16LE"
>>>>>>>> to
>>>>>>>>     is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <
>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Created GROOVY-7461
>>>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461>
and PR 36
>>>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cool!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> How would you feel about a PR to copy the Javadoc
comment
>>>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all
the other
>>>>>>>>>> methods that use writeUTF16BomIfRequired (at least
until we
>>>>>>>>>> decide we're going to change the current behavior)?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Right, worth it!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Keegan
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge
<
>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Good point!
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> That's only available in Java 7.  Isn't Groovy
still targeting
>>>>>>>>>>>> 1.6 for the non-indy version?
>>>>>>>>>>>>
>>>>>>>>>>>> -Keegan
>>>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge"
<glaforge@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Well spotted!
>>>>>>>>>>>>>
>>>>>>>>>>>>> You could also compare with the StandardCharset,
instead of
>>>>>>>>>>>>> going through the name comparison:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt
<keeganwitt@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, it's a Groovy bug.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final
String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>     if ("UTF-16BE".equals(charset))
{
>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(charset))
{
>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final
String charset, final OutputStream stream) throws IOException {
>>>>>>>>>>>>>>     if ("UTF-16BE".equals(Charset.forName(charset).name()))
{
>>>>>>>>>>>>>>         writeUtf16Bom(stream, true);
>>>>>>>>>>>>>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name()))
{
>>>>>>>>>>>>>>         writeUtf16Bom(stream, false);
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.
 We'll
>>>>>>>>>>>>>> probably want to fix that regardless
of what we decide on the
>>>>>>>>>>>>>> *withPrintWriter* question.  I'll
open a Jira and a PR.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume
Laforge <
>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From Groovy's point of view (ie.
when you're coding in
>>>>>>>>>>>>>>> Groovy), the BOM is automatically
discarded when you use one of our reader
>>>>>>>>>>>>>>> methods (withReader, etc), so
it's transparent whether the BOM is here or
>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I tend to think that having the
BOM always is a good thing
>>>>>>>>>>>>>>> (I even thought that was mandatory),
but Groovy should guess the endianness
>>>>>>>>>>>>>>> regardless anyway.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Happy to hear what others think
too about all this though.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Guillaume
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan
Witt <keeganwitt@gmail.com
>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The code as-is today writes
the BOM regardless of
>>>>>>>>>>>>>>>> platform.  I just tested
in Linux with the same results.  I think there are
>>>>>>>>>>>>>>>> 2 parts to the question of
"what's the correct behavior?"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1.  Should the BOM be written
at all, particularly when the
>>>>>>>>>>>>>>>> platform is Windows?
>>>>>>>>>>>>>>>> 2.  Should the behavior of
*withPrintWriter* differ (even
>>>>>>>>>>>>>>>> if the difference is to be
smarter) from the behavior of *new
>>>>>>>>>>>>>>>> PrintWriter*?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Discussion*
>>>>>>>>>>>>>>>> 1.  Strictly speaking, yes.
 Because RFC 2781
>>>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781>
states in section 4.3
>>>>>>>>>>>>>>>> to assume big endian if there
is no BOM.  However, in practice, many
>>>>>>>>>>>>>>>> applications disregard the
RFC and assume little-endian because that's what Windows
>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>>>>>>>>>>>>>> Because of this, the behavior
could be changed so that when writing
>>>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't
write the BOM.  But in my opinion, it's
>>>>>>>>>>>>>>>> best practice to always write
a BOM when working with UTF-16, and Java
>>>>>>>>>>>>>>>> should have done this in
their implementation of their PrintWriter.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2.  This is a tough one.
 Arguably, *withPrintWriter* is
>>>>>>>>>>>>>>>> doing the smarter, more correct
behavior, but the typical user would assume
>>>>>>>>>>>>>>>> this is just a shorthand
convenience for newing up a PrintWriter (I
>>>>>>>>>>>>>>>> certainly did).  So the question
is, is it better to just document this
>>>>>>>>>>>>>>>> difference in the GroovyDoc?
 Or to change the behavior to be closer to
>>>>>>>>>>>>>>>> Java?  And if the latter,
what breakages would that cause within Groovy
>>>>>>>>>>>>>>>> itself?  Making that change
could break folks in production, because they
>>>>>>>>>>>>>>>> could rely on that BOM being
there, in cases for example where the file is
>>>>>>>>>>>>>>>> created on Windows, but then
processed on Linux or when working with a
>>>>>>>>>>>>>>>> third party library that
is more picky about the presence of a BOM.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32
PM, Guillaume Laforge <
>>>>>>>>>>>>>>>> glaforge@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now... is it what should
be done or not is the good
>>>>>>>>>>>>>>>>> question to ask :-)
>>>>>>>>>>>>>>>>> Does Windows manages
to open UTF-16 files without BOMs?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00
Keegan Witt <
>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I forgot to mention
that.  Yes, I ran the test mentioned
>>>>>>>>>>>>>>>>>> in Windows.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015
at 3:54 PM, Guillaume Laforge <
>>>>>>>>>>>>>>>>>> glaforge@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> That's a good
question.
>>>>>>>>>>>>>>>>>>> I guess this
is happening on Windows? (I haven't tried
>>>>>>>>>>>>>>>>>>> here, since I'm
on OS X)
>>>>>>>>>>>>>>>>>>> I think BOMs
were mandatory in text files on Windows.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2015-06-08 17:53
GMT+02:00 Keegan Witt <
>>>>>>>>>>>>>>>>>>> keeganwitt@gmail.com>:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I've always
taken a perverse pleasure in character
>>>>>>>>>>>>>>>>>>>> encoding
problems.  I was intrigued by this SO question
>>>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
>>>>>>>>>>>>>>>>>>>> UTF 16 BOMs
in Java vs Groovy.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It appears
using withPrintWriter(charset) produces a
>>>>>>>>>>>>>>>>>>>> BOM whereas
new PrintWriter(file, charset) does not.
>>>>>>>>>>>>>>>>>>>> As demonstrated
here:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> File file
= new File("tmp.txt")try {
>>>>>>>>>>>>>>>>>>>>     String
text = " "
>>>>>>>>>>>>>>>>>>>>     String
charset = "UTF-16LE"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     file.withPrintWriter(charset)
{ it << text }
>>>>>>>>>>>>>>>>>>>>     println
"withPrintWriter"
>>>>>>>>>>>>>>>>>>>>     file.getBytes().each
{ System.out.format("%02x ", it) }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     PrintWriter
w = new PrintWriter(file, charset)
>>>>>>>>>>>>>>>>>>>>     w.print(text)
>>>>>>>>>>>>>>>>>>>>     w.close()
>>>>>>>>>>>>>>>>>>>>     println
"\n\nnew PrintWriter"
>>>>>>>>>>>>>>>>>>>>     file.getBytes().each
{ System.out.format("%02x ", it) }} finally {
>>>>>>>>>>>>>>>>>>>>     file.delete()}
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Outputs
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> withPrintWriter
>>>>>>>>>>>>>>>>>>>> ff fe 20
00
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> new PrintWriter
>>>>>>>>>>>>>>>>>>>> 20 00
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Is this difference
in behavior intentional?  It seems
>>>>>>>>>>>>>>>>>>>> kinda odd
to me.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Keegan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>>>> Groovy Project
Manager
>>>>>>>>>>>>>>>>>>> Product Ninja
& Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>>>> Social: @glaforge
<http://twitter.com/glaforge> /
>>>>>>>>>>>>>>>>>>> Google+
>>>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>>>> Product Ninja & Advocate
at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>>>> Product Ninja & Advocate
at Restlet <http://restlet.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>>>> Product Ninja & Advocate at Restlet
<http://restlet.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Laforge
>>>>>>>>>>> Groovy Project Manager
>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Laforge
>>>>>>>>> Groovy Project Manager
>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>>>
>>>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge>
/ Google+
>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Mime
View raw message