hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Outputting extended ascii characters in Hadoop?
Date Fri, 09 Oct 2009 23:49:43 GMT
Hi Mark,

If you're using TextOutputFormat, it assumes you're dealing in UTF8. Decimal
254 wouldn't be valid as a standalone character in UTF8 encoding.

If you're dealing with binary (ie non-textual) data, you shouldn't use
TextOutputFormat.

-Todd

On Fri, Oct 9, 2009 at 3:09 PM, Mark Kerzner <markkerzner@gmail.com> wrote:

> Hi,
> the strings I am writing in my reducer have characters that may present a
> problem, such as char represented by decimal 254, which is hex FE. It seems
> that instead I see hex C3, or something else is messed up. Or my
> understanding is messed up :)
>
> Any advice?
>
> Thank you,
> Mark
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message