hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NOMURA Yoshihide <y.nom...@jp.fujitsu.com>
Subject Text file character encoding
Date Mon, 02 Jun 2008 06:19:52 GMT
I'm using Hadoop 0.17.0 to analyze some large amount of CSV files.

And I need to read such files in different character encoding from UTF-8,
but I think TextInputFormat doesn't support such character encoding.

I guess LineRecordReader class or Text class should support encoding
settings like this.
 conf.set("io.file.defaultEncoding", "MS932");

Is there any plan to supoort different character encoding in

NOMURA Yoshihide:
    Software Innovation Laboratory, Fujitsu Labs. Ltd., Japan
    Tel: 044-754-2675 (Ext: 7112-6358)
    Fax: 044-754-2570 (Ext: 7112-3834)
    E-Mail: [y.nomura@jp.fujitsu.com]

View raw message