accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Setting Charset in getBytes() call.
Date Tue, 30 Oct 2012 20:26:41 GMT
On Mon, Oct 29, 2012 at 9:22 PM, Drew Farris <drew@apache.org> wrote:

> I have always wondered if there were cases in the API where users are
> forced to use Text when they would otherwise prefer byte[], e.g: stuffing a
> non utf8 byte[] into a Text object to facilitate storage or sorting. Not
> entirely sure whether Text would complain if this were the case. I suspect
> we should seek to elimimate these if they currently exist.
>

The Text class is essentially a wrapper around a byte[], with some
convenience methods for translating to/from other types. Accumulo only ever
reads bytes out of it, so there is no encoding problem there. We also don't
use most of its convenience methods. Many people see that it is named
"Text" and assume that it only stores human readable text, but that is not
the case. It probably should have been named
"ConvenientByteArrayWrapperWithSomeMemoryEfficiencySupportAndStringOrientedTranslationMethodsThatIsWritableComparable".

I also agree that it would be good to get rid of the reliance on Hadoop's
Text object, especially because people often do not respect getLength() on
the byte[] obtained from getBytes().

Adam

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message