hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naveen Kumar Prasad <naveenkum...@huawei.com>
Subject RE: encoding types supported by Hadoop
Date Fri, 22 Jan 2010 05:01:27 GMT
Hi Todd,

To elaborate more on the encoding query : 

Actually the input file we use while working with Hadoop, may have different
encoding types,
Like : encoding="UTF-8" (UTF-16, GBK, etc)
So I want to know which all encoding types are supported by Hadoop.

User Scenario : 
I want to read from a input text file (suppose file01.txt) which has chinese
And write it to a output text file (suppose fileo2.txt) and verify whether
the chinese characters 
are coming properly in the output file (and not as junk characters).

{ It would be appreciable if u cud tell me how to verify this. ) 

Naveen Kumar

Address: Huawei Industrial Base
Bantian Longgang
Shenzhen 518129, P.R.China
This e-mail and its attachments contain confidential information from
HUAWEI, which is intended only for the person or entity whose address is
listed above. Any use of the information contained herein in any way
(including, but not limited to, total or partial disclosure, reproduction,
or dissemination) by persons other than the intended recipient(s) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

-----Original Message-----
From: Todd Lipcon [mailto:todd@cloudera.com] 
Sent: Friday, January 22, 2010 10:16 AM
To: general@hadoop.apache.org; naveenkumarp@huawei.com
Subject: Re: encoding types supported by Hadoop

Hi Naveen,

On Thu, Jan 21, 2010 at 7:54 PM, Naveen Kumar Prasad <
naveenkumarp@huawei.com> wrote:

> Hi All,
> I am new to hadoop/Mapreduce usage.
> Can anyone tell me how to write a simple MapReduce implementation to 
> just read some files from the <input> directory and write to <output> 
> directory.

It sounds like what you want is the distcp job. Just run "hadoop distcp" and
it will print some usage information for you.

> Also I wanted to know which all encoding types are supported by Hadoop 
> and how to configure and use various encoding types.
I'm not sure what you mean here by encoding. Could you elaborate on this
question, please?


View raw message