hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.wang.1...@gmail.com>
Subject Re: encoding types supported by Hadoop
Date Fri, 22 Jan 2010 05:36:43 GMT
Hi,Naveen

The default encoding type supported by Hadoop is UTF-8, so, if you'd like to
use other types, you have to custom the FileInputFormat and
FileOutPutFormat.

For me, i like to convert the input content to some special encoding type,
etc: String line = new String(value.getBytes(), 0, value.getLength(),"GBK");


At same time, i implement one custom FileOutputFormat,
namely,GbkOutputFormat, to support output with GBK type.

On Fri, Jan 22, 2010 at 11:54 AM, Naveen Kumar Prasad <
naveenkumarp@huawei.com> wrote:

> Hi All,
>
> I am new to hadoop/Mapreduce usage.
>
> Can anyone tell me how to write a simple MapReduce implementation to just
> read some files from the <input> directory
> and write to <output> directory.
>
> Also I wanted to know which all encoding types are supported by Hadoop and
> how to configure and use various encoding types.
>
> Regards,
> Naveen Kumar
> HUAWEI TECHNOLOGIES CO.,LTD.  huawei_logo
>
> <outbind://24-000000004ED456449CB8724C859F40771D413A7584132000/cid:919325906
> @21052009-015A>
>
>
> Address: Huawei Industrial Base
> Bantian Longgang
> Shenzhen 518129, P.R.China
> www.huawei.com
>
> ----------------------------------------------------------------------------
> -------------------------------------
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by
> phone or email immediately and delete it!
>
>


-- 
http://anqiang1900.blog.163.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message