maven-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Bentmann" <benjamin.bentm...@udo.edu>
Subject Re: Common Bugs
Date Sat, 08 Mar 2008 20:07:08 GMT
Hi,

5) Reading and Writing Text Files

Textual content is composed of characters while file systems merely store
byte streams. A file encoding (aka charset) is used to convert between bytes
and characters. The challenge is using the right file encoding...

The JVM has this notion of a default encoding ("file.encoding" property)
which it derives from a system's locale or whatever. While this might be a
convenient feature sometimes, using this default encoding for a project
build is in general a bad idea: The build output will depend on the
machine/developer who runs the build. As such, usage of the default encoding
threatens the dream of a reproducible build.

For example, if developer A has UTF-8 as default encoding while developer B
uses ISO-8859-1, text files are very likely to get messed up during resource
filtering or similar tasks.

Therefore, plugin developers should avoid any direct or indirect usage
of the classes FileWriter and FileReader. Instead,
OutputStreamWriter/-Reader should be used with an explicit encoding value.
The required encoding value can be obtained from a configuration parameter
like this:

  /*
   * @parameter default-value="ISO-8859-1"
   */
  private String encoding;

Providing a default value resembles the JVM's concept of a default encoding
with an important difference: This time, the default is specified by the
plugin and hence controlled by the POM. This way, all builds from the same
POM can be guranteed to use the same encoding regardless of the JVM
executing Maven.

Plugins that already provide means to specify the encoding should make sure
they have a default value for this parameter. This is to follow Maven's
philosophy of "convention over configuration" where users should get the
best practice out-of-the-box when a reasonable default can be assumed.

Handling XML files is a little different because these files are equipped
with an encoding declaration. Thanks to Herve Boutemy, plexus-utils provides
a convenient Reader-/WriterFactory for the magic of auto-detecting the
encoding from a byte stream (see also [0]). When writing XML files without
the XmlStreamWriter, be sure to ensure the encoding used for the output
writer matches the encoding specified by the XML declaration being written. 
Otherwise later parsing the output might fail.

Regards,


Benjamin Bentmann


[0] http://docs.codehaus.org/display/MAVENUSER/XML+encoding


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Mime
View raw message