hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omer Trajman" <o...@vertica.com>
Subject RE: Text coding
Date Wed, 30 Dec 2009 16:20:58 GMT
You can use ByteBuffer or StringBuffer to go from byte[] to a stream of bytes or StringUtils
to convert to a hex string.

-----Original Message-----
From: Aram Mkhitaryan [mailto:aram.mkhitaryan@googlemail.com] 
Sent: Monday, December 28, 2009 12:21 PM
To: common-user@hadoop.apache.org
Subject: Re: Text coding

the point is that in java everything is an object, the byte[ ] as well
so when you call byte[].toString() as usual you get '[', a letter that
defines the type 'B', then comes '@' and then the hash code
this is the standard toString() implementation for the array object

I would recommend to implement your own Writable that would handle
your byte array (if it's not done yet)

2009/12/28 Todd Lipcon <todd@cloudera.com>:
> Furthermore, Text is meant for use when you have a UTF8-encoded string.
> Creating a Text object from a byte array that is not proper UTF-8 is likely
> to result in some kind of exception or data mangling. You should use
> BytesWritable for this purpose
> -Todd
> 2009/12/28 Edward Capriolo <edlinuxguru@gmail.com>
>> Calling bitArray.toString() does not return your data. You can test
>> this in a standalong program.
>> You need to write the array out bitwise or byte wise. toString() does
>> not do what you want.
>> Edward
>> 2009/12/28 Gang Luo <lgpublic@yahoo.com.cn>:
>> > Hi all,
>> > I don't know too much about text coding and there is one thing confusing
>> me. I am implementing the bloom filter in mapreduce. The output is a bit
>> array (implemented as byte[ ]) and the length is 2 exp 24 (that means,
>> 2exp21 bytes). The size of the array should be 2 mb. But when I output it
>> like this: output.collect(new Text(bitArray.toString()), null); the output
>> file is only 10 bytes. The content of the output file is something like
>> this: [B@1c695a6. What does Text do when I generate a new Text object
>> using the bitArray (which is byte[ ])?
>> >
>> > The amazing thing is, when I use Text.getBytes() to convert it back to
>> byte[ ], it is exactly the same as before! How does it get the 2 mb
>> information by the 10 bytes Text object (the value of which is [B@1c695a6
>> )?
>> >
>> > Thanks.
>> >
>> >  -Gang
>> >
>> >
>> >      ___________________________________________________________
>> >  好玩贺卡等你发,邮箱贺卡全新上线!
>> > http://card.mail.cn.yahoo.com/
View raw message