From general-return-961-apmail-hadoop-general-archive=hadoop.apache.org@hadoop.apache.org Fri Jan 22 05:02:05 2010 Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 87274 invoked from network); 22 Jan 2010 05:02:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Jan 2010 05:02:05 -0000 Received: (qmail 23600 invoked by uid 500); 22 Jan 2010 05:02:04 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 23481 invoked by uid 500); 22 Jan 2010 05:02:03 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 23470 invoked by uid 99); 22 Jan 2010 05:02:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Jan 2010 05:02:03 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of naveenkumarp@huawei.com designates 119.145.14.64 as permitted sender) Received: from [119.145.14.64] (HELO szxga01-in.huawei.com) (119.145.14.64) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Jan 2010 05:01:53 +0000 Received: from huawei.com (szxga01-in [172.24.2.3]) by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0KWM004TPTYJMB@szxga01-in.huawei.com> for general@hadoop.apache.org; Fri, 22 Jan 2010 13:01:31 +0800 (CST) Received: from huawei.com ([172.24.2.119]) by szxga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0KWM00I37TYJGB@szxga01-in.huawei.com> for general@hadoop.apache.org; Fri, 22 Jan 2010 13:01:31 +0800 (CST) Received: from BLRNSHTIPL6NC ([10.18.1.36]) by szxml06-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTPA id <0KWM00L7DTYHOI@szxml06-in.huawei.com> for general@hadoop.apache.org; Fri, 22 Jan 2010 13:01:31 +0800 (CST) Date: Fri, 22 Jan 2010 10:31:27 +0530 From: Naveen Kumar Prasad Subject: RE: encoding types supported by Hadoop In-reply-to: <45f85f71001212045k49edcd4fhd2ebee0fac798a9b@mail.gmail.com> To: general@hadoop.apache.org Cc: todd@cloudera.com Reply-to: naveenkumarp@huawei.com Message-id: <002201ca9b1f$f5b82d20$2401120a@china.huawei.com> Organization: Htipl MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.3790.3168 X-Mailer: Microsoft Office Outlook 11 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT Thread-index: AcqbHeffaDmIaxFdSuaRfWNb5zhZaAAAS7og X-Virus-Checked: Checked by ClamAV on apache.org Hi Todd, To elaborate more on the encoding query : Actually the input file we use while working with Hadoop, may have different encoding types, Like : encoding="UTF-8" (UTF-16, GBK, etc) So I want to know which all encoding types are supported by Hadoop. User Scenario : I want to read from a input text file (suppose file01.txt) which has chinese characters And write it to a output text file (suppose fileo2.txt) and verify whether the chinese characters are coming properly in the output file (and not as junk characters). { It would be appreciable if u cud tell me how to verify this. ) Regards, Naveen Kumar HUAWEI TECHNOLOGIES CO.,LTD. huawei_logo Address: Huawei Industrial Base Bantian Longgang Shenzhen 518129, P.R.China www.huawei.com ---------------------------------------------------------------------------- ------------------------------------- This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -----Original Message----- From: Todd Lipcon [mailto:todd@cloudera.com] Sent: Friday, January 22, 2010 10:16 AM To: general@hadoop.apache.org; naveenkumarp@huawei.com Subject: Re: encoding types supported by Hadoop Hi Naveen, On Thu, Jan 21, 2010 at 7:54 PM, Naveen Kumar Prasad < naveenkumarp@huawei.com> wrote: > Hi All, > > I am new to hadoop/Mapreduce usage. > > Can anyone tell me how to write a simple MapReduce implementation to > just read some files from the directory and write to > directory. > It sounds like what you want is the distcp job. Just run "hadoop distcp" and it will print some usage information for you. > > Also I wanted to know which all encoding types are supported by Hadoop > and how to configure and use various encoding types. > > I'm not sure what you mean here by encoding. Could you elaborate on this question, please? Thanks -Todd