hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Text coding
Date Mon, 28 Dec 2009 16:48:30 GMT
Furthermore, Text is meant for use when you have a UTF8-encoded string.
Creating a Text object from a byte array that is not proper UTF-8 is likely
to result in some kind of exception or data mangling. You should use
BytesWritable for this purpose

-Todd

2009/12/28 Edward Capriolo <edlinuxguru@gmail.com>

> Calling bitArray.toString() does not return your data. You can test
> this in a standalong program.
> You need to write the array out bitwise or byte wise. toString() does
> not do what you want.
>
> Edward
>
> 2009/12/28 Gang Luo <lgpublic@yahoo.com.cn>:
> > Hi all,
> > I don't know too much about text coding and there is one thing confusing
> me. I am implementing the bloom filter in mapreduce. The output is a bit
> array (implemented as byte[ ]) and the length is 2 exp 24 (that means,
> 2exp21 bytes). The size of the array should be 2 mb. But when I output it
> like this: output.collect(new Text(bitArray.toString()), null); the output
> file is only 10 bytes. The content of the output file is something like
> this: [B@1c695a6. What does Text do when I generate a new Text object
> using the bitArray (which is byte[ ])?
> >
> > The amazing thing is, when I use Text.getBytes() to convert it back to
> byte[ ], it is exactly the same as before! How does it get the 2 mb
> information by the 10 bytes Text object (the value of which is [B@1c695a6
> )?
> >
> > Thanks.
> >
> >  -Gang
> >
> >
> >      ___________________________________________________________
> >  好玩贺卡等你发,邮箱贺卡全新上线!
> > http://card.mail.cn.yahoo.com/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message