hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Unicode issues with Distributed Cache
Date Sat, 04 May 2013 18:28:15 GMT
Hi,

We are adding ISO-8859-1 content type file in Distributed Cache for look up
purpose in MR Job.

But when we try to read the content from Distributed Cache file in MR, we
are facing Unicode issues.

Please find the sample code snippet below:
               @Override
protected void setup(Context context) throws java.io.IOException,
InterruptedException {
Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context
.getConfiguration());
lookUp = cacheFiles[0];
File file = new File(lookUp.toString());
reader = new BufferedReader(new InputStreamReader(new FileInputStream(
file), Charset.forName("ISO-8859-1")));
String line;
while ((line = reader.readLine()) != null) {
:
 System.out.println(line);
:
}
reader.close();
};

But When try to read the same file manually, as below on same cluster
machine, It's working fine.

BufferedReader input = new BufferedReader(
new InputStreamReader(new FileInputStream(path.toString()),
Charset.forName("ISO-8859-1")));

May I know, Is this the Distributed Cache issue?

Mime
View raw message