hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amr Awadallah <...@cloudera.com>
Subject Re: Outputting extended ascii characters in Hadoop?
Date Tue, 13 Oct 2009 03:21:43 GMT
^A for quote, ^B for comma .. and so on.

-- amr

Mark Kerzner wrote:
> Thanks again, Todd. I need two delimiters, one for comma and one for quote.
> But I guess I can use ^A for quote, and keep the comma as is, and I will be
> good.
> Sincerely,
> Mark
>
> On Mon, Oct 12, 2009 at 10:15 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>   
>> Hey Mark,
>>
>> The most commonly used delimiter for cases like this is ^A (character 1)
>>
>> -Todd
>>
>> On Mon, Oct 12, 2009 at 7:56 PM, Mark Kerzner <markkerzner@gmail.com>
>> wrote:
>>
>>     
>>> Thanks, that is a great answer.
>>> My problem is that the application that reads my output accepts a
>>> comma-separated file with extended ASCII delimiters. Following your
>>>       
>> answer,
>>     
>>> however, I will try to use low-value ASCII, like 9 or 11, unless someone
>>> has
>>> a better suggestion.
>>>
>>> Thank you,
>>> Mark
>>>
>>> On Fri, Oct 9, 2009 at 6:49 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>>
>>>       
>>>> Hi Mark,
>>>>
>>>> If you're using TextOutputFormat, it assumes you're dealing in UTF8.
>>>> Decimal
>>>> 254 wouldn't be valid as a standalone character in UTF8 encoding.
>>>>
>>>> If you're dealing with binary (ie non-textual) data, you shouldn't use
>>>> TextOutputFormat.
>>>>
>>>> -Todd
>>>>
>>>> On Fri, Oct 9, 2009 at 3:09 PM, Mark Kerzner <markkerzner@gmail.com>
>>>> wrote:
>>>>
>>>>         
>>>>> Hi,
>>>>> the strings I am writing in my reducer have characters that may
>>>>>           
>> present
>>     
>>> a
>>>       
>>>>> problem, such as char represented by decimal 254, which is hex FE. It
>>>>>           
>>>> seems
>>>>         
>>>>> that instead I see hex C3, or something else is messed up. Or my
>>>>> understanding is messed up :)
>>>>>
>>>>> Any advice?
>>>>>
>>>>> Thank you,
>>>>> Mark
>>>>>
>>>>>           
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message