Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F2DCDFB67 for ; Sat, 4 May 2013 18:28:46 +0000 (UTC) Received: (qmail 82612 invoked by uid 500); 4 May 2013 18:28:42 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 82261 invoked by uid 500); 4 May 2013 18:28:42 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 82254 invoked by uid 99); 4 May 2013 18:28:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 May 2013 18:28:41 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akumarb2010@gmail.com designates 209.85.212.171 as permitted sender) Received: from [209.85.212.171] (HELO mail-wi0-f171.google.com) (209.85.212.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 May 2013 18:28:36 +0000 Received: by mail-wi0-f171.google.com with SMTP id l13so1502652wie.4 for ; Sat, 04 May 2013 11:28:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=lC6CfPKsutHEVOZ0Kwh6qkVYY6gvVynhNk3tOGk/7QI=; b=IE6iDaXaoCez1Yc2Z/7kHG4D/L6LroUlxlE71kEfQXDwvyL5i3TDGcsXcBAhlTBta+ q/WgYNVKQgdnqvekNdWZsGiBYabVMP65llQRx7WcrqMrWp/a9jSqqCwbWq+o5wpELIal IP88ecd8xW8h2y5LALTD4+ZXJe/0IIAQkfL90gVPvE/CkXUPv9p4qrdCdjKaLgerJDik 9f2uTkrZZS2aWPDzJtBfJffq4sGKcexSCH9KMo6xd9TSx8Dlwv0dVK2yhkwFpapnqHqz eKSlOfPUQcw2EbQup5oq+nojXomBcU72i6AqFYFYa73/WbzrCKlVP1WNfjH6WlDXYmxs pPSw== MIME-Version: 1.0 X-Received: by 10.180.77.12 with SMTP id o12mr2749042wiw.0.1367692095678; Sat, 04 May 2013 11:28:15 -0700 (PDT) Received: by 10.194.92.19 with HTTP; Sat, 4 May 2013 11:28:15 -0700 (PDT) Date: Sat, 4 May 2013 23:58:15 +0530 Message-ID: Subject: Unicode issues with Distributed Cache From: AnilKumar B To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d0434393a8670b004dbe8a1ce X-Virus-Checked: Checked by ClamAV on apache.org --f46d0434393a8670b004dbe8a1ce Content-Type: text/plain; charset=ISO-8859-1 Hi, We are adding ISO-8859-1 content type file in Distributed Cache for look up purpose in MR Job. But when we try to read the content from Distributed Cache file in MR, we are facing Unicode issues. Please find the sample code snippet below: @Override protected void setup(Context context) throws java.io.IOException, InterruptedException { Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context .getConfiguration()); lookUp = cacheFiles[0]; File file = new File(lookUp.toString()); reader = new BufferedReader(new InputStreamReader(new FileInputStream( file), Charset.forName("ISO-8859-1"))); String line; while ((line = reader.readLine()) != null) { : System.out.println(line); : } reader.close(); }; But When try to read the same file manually, as below on same cluster machine, It's working fine. BufferedReader input = new BufferedReader( new InputStreamReader(new FileInputStream(path.toString()), Charset.forName("ISO-8859-1"))); May I know, Is this the Distributed Cache issue? --f46d0434393a8670b004dbe8a1ce Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

We are adding ISO-8859-1 cont= ent type file in Distributed Cache for look up purpose in MR Job.

But when we try to read the content from Distri= buted Cache file in MR, we are facing Unicode issues.

Please find the sample code snippet below:<= /div>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0@Override
protected void setup(Conte= xt context) throws java.io.IOException,
InterruptedExcep= tion {
Path[= ] cacheFiles =3D DistributedCache.getLocalCacheFiles(context
.getConfiguration());
lookUp =3D cacheF= iles[0];
Fil= e file =3D new File(lookUp.toString());
reader =3D new BufferedReader(new InputStreamRe= ader(new FileInputStream(
file), Charset.= forName("ISO-8859-1")));
String line;
while ((line =3D reader.readLine()) !=3D null) {
:
System.out.println(line);
:
}
reader.close();
<= span class=3D"" style=3D"white-space:pre"> };

<= div style> But When try to read the same file manually, as below on same cluster machi= ne, It's working fine.

Buffer= edReader input =3D new BufferedReader(
new InputStreamReader(new FileInputStream(path= .toString()),=A0
Charset.forNa= me("ISO-8859-1")));

May I know, Is= this the Distributed Cache issue?=A0


--f46d0434393a8670b004dbe8a1ce--