hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: Localization feature
Date Fri, 24 Jan 2014 13:49:09 GMT
You need to be more clear about how do you process the files.
I think the important question is what kind of InputFormat and OutputFormat you are using
in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat
will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8,
then your output should be fine.
To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?2) How do you write the data output? Using
Reducer Context.write to output, or you write to HDFS directly in your code?3) What encoding
is your source data?

Subject: Localization feature
Date: Fri, 24 Jan 2014 09:54:15 +0530
From: khaleelk@suntecgroup.com
To: user@hadoop.apache.org

Hi All,
Does Hadoop/MapReduce have localization feature?
There is a scenario wherein we have to process files containing Dutch, German characters.

When we process files containing a character like 'Ç', the character gets replaced by '�'
in the output.
Is there any possible work around for this?

Thanks in advance,

View raw message