hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Abitbol <bruno.abit...@jobomix.com>
Subject Encoding Hell
Date Fri, 18 Dec 2009 13:42:54 GMT
Hi,

I have been playing around for two days trying to figure out an issue
related to the default charset:


   - When I run a very dummy job which just displays the default charset on
   hadoop using the pseudo connected mode, I obtain US-ASCII. When I display
   the java property file.encoding I obtain ANSI_X3.4-1968


   - When I run the same job under Eclipse in locale mode I obtain UTF-8
   (which is the one I expect).

I use a Linux Gentoo distribution, the locale env variables are the
following:

LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=en_GB.UTF-8

I have tried to set the file.encoding property to UTF-8 but it doesn't work.
Any help would be greatly appreciated.

Thank you.

-- 
Bruno Abitbol
bruno.abitbol@jobomix.com
http://www.jobomix.fr

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message