hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshi, Rekha" <Rekha_Jo...@intuit.com>
Subject Re: Non utf-8 chars in input
Date Tue, 11 Sep 2012 07:07:53 GMT
Hi Ajay,

Try SequenceFileAsBinaryInputFormat ?


Thanks
Rekha

On 11/09/12 11:24 AM, "Ajay Srivastava" <Ajay.Srivastava@guavus.com> wrote:

>Hi,
>
>I am using default inputFormat class for reading input from text files
>but the input file has some non utf-8 characters.
>I guess that TextInputFormat class is default inputFormat class and it
>replaces these non utf-8 chars by "\uFFFD". If I do not want this
>behavior and need actual char in my mapper what should be the correct
>inputFormat class ?
>
>
>
>Regards,
>Ajay Srivastava


Mime
View raw message