hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Outputting extended ascii characters in Hadoop?
Date Tue, 13 Oct 2009 03:22:54 GMT
^A is ascii 1.. You can use ascii 2 for the comma...

On 10/12/09, Mark Kerzner <markkerzner@gmail.com> wrote:
> Thanks again, Todd. I need two delimiters, one for comma and one for quote.
> But I guess I can use ^A for quote, and keep the comma as is, and I will be
> good.
> Sincerely,
> Mark
>
> On Mon, Oct 12, 2009 at 10:15 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Hey Mark,
>>
>> The most commonly used delimiter for cases like this is ^A (character 1)
>>
>> -Todd
>>
>> On Mon, Oct 12, 2009 at 7:56 PM, Mark Kerzner <markkerzner@gmail.com>
>> wrote:
>>
>> > Thanks, that is a great answer.
>> > My problem is that the application that reads my output accepts a
>> > comma-separated file with extended ASCII delimiters. Following your
>> answer,
>> > however, I will try to use low-value ASCII, like 9 or 11, unless someone
>> > has
>> > a better suggestion.
>> >
>> > Thank you,
>> > Mark
>> >
>> > On Fri, Oct 9, 2009 at 6:49 PM, Todd Lipcon <todd@cloudera.com> wrote:
>> >
>> > > Hi Mark,
>> > >
>> > > If you're using TextOutputFormat, it assumes you're dealing in UTF8.
>> > > Decimal
>> > > 254 wouldn't be valid as a standalone character in UTF8 encoding.
>> > >
>> > > If you're dealing with binary (ie non-textual) data, you shouldn't use
>> > > TextOutputFormat.
>> > >
>> > > -Todd
>> > >
>> > > On Fri, Oct 9, 2009 at 3:09 PM, Mark Kerzner <markkerzner@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > > the strings I am writing in my reducer have characters that may
>> present
>> > a
>> > > > problem, such as char represented by decimal 254, which is hex FE.
>> > > > It
>> > > seems
>> > > > that instead I see hex C3, or something else is messed up. Or my
>> > > > understanding is messed up :)
>> > > >
>> > > > Any advice?
>> > > >
>> > > > Thank you,
>> > > > Mark
>> > > >
>> > >
>> >
>>
>


-- 


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Mime
View raw message