hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <ted.dunn...@gmail.com>
Subject Re: Text file character encoding
Date Mon, 02 Jun 2008 15:09:26 GMT
You should file a Jira, make the change and submit a patch!

On Sun, Jun 1, 2008 at 11:19 PM, NOMURA Yoshihide <y.nomura@jp.fujitsu.com>
wrote:

> Hello,
> I'm using Hadoop 0.17.0 to analyze some large amount of CSV files.
>
> And I need to read such files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
>
> I guess LineRecordReader class or Text class should support encoding
> settings like this.
>  conf.set("io.file.defaultEncoding", "MS932");
>
> Is there any plan to supoort different character encoding in
> TextInputFormat?
>
> Regards,
> --
> NOMURA Yoshihide:
>    Software Innovation Laboratory, Fujitsu Labs. Ltd., Japan
>    Tel: 044-754-2675 (Ext: 7112-6358)
>    Fax: 044-754-2570 (Ext: 7112-3834)
>    E-Mail: [y.nomura@jp.fujitsu.com]
>
>


-- 
ted

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message